Hiring Infrastructure Engineer (Elasticsearch) | Roboflow

Infrastructure Engineer (Elasticsearch)


Date listed

3 weeks ago

Employment Type

Full time



Roboflow is rapidly expanding our engineering team to address the groundswell of user and customer needs. Over 100,000 developers (spanning from students to individual hackers & hobbyists to startups to employees of some of the world’s biggest companies) have now used Roboflow to build computer vision projects. Soon, every developer will have computer vision as a tool in their toolbox. Roboflow will be for computer vision what Microsoft was for the PC and Google was for the Internet.

Our core belief is that computer vision is a foundational technology that is going to transform nearly every industry. This is an opportunity to shape how millions of developers will experience and use it for the first time. Your contribution will have a massive impact.

The Opportunity

Roboflow is scaling rapidly. We now manage over 100 million images for hundreds of thousands of users. Having secure and reliable cloud infrastructure to support our growth is of paramount importance.

The Roboflow product spans the entire end-to-end machine vision pipeline. So, naturally, the infrastructure presents a wide range of challenges. From driving efficiencies in GPU batch computing to shaving off milliseconds off latencies of our hosted machine learning inference APIs, to supporting hundreds of thousands of users worldwide with best-in-class site reliability and data protection.

Our infrastructure runs across AWS and GCP. Our core web-app runs on Firebase (Firestore, Functions, Storage, Hosting). We heavily utilize serverless compute products where possible, but also run clusters of GPU-powered machines on AWS Batch and in managed instance groups fed by pub-sub queues when necessary. We are increasingly using Kubernetes internally, and are working on a self-hosted version of our platform.

The Role

The focus of this role is on improving, scaling, and maintaining our the infrastructure that powers our core app, including: our cloud architecture, databases, file storage, search cluster, micro-services, and machine learning pipelines.

You'll be working alongside our existing infrastructure team along with doing cross-team work spanning product, operations and customer-facing projects and should have the ability to context switch across a wide range of infrastructure, security and systems engineering work in a fast-paced startup environment.

Specific Skillset

We're looking to augment our existing team with someone who has deep experience in managing a large Elasticsearch cluster indexing machine learning vectors that power many features in our product including dataset search. Migrating from Elastic Cloud to a self-hosted Elastic cluster is a top priority. We estimate about 20% of your time will be focused on our Elastic cluster and you will be our resident Elastic expert.

Additionally, experience with some or all of the following would be helpful:

  • Infrastructure-as-code - Terraform, bash scripting automation in production environments
  • Site reliability - alerting, monitoring, scaling services in AWS and GCP clouds
  • Node.js and Python programming skills; ability to work with full-stack developers on designing, developing, and operating SaaS applications
  • Experience with machine learning/big data at scale (GPU, Docker and Kubernetes)
  • Awareness of security best practices and tightening infrastructure for highly secure cloud operations; ideally experienced in a ISO 27001 or SoC2 certification for SaaS applications
  • Experience with CI/CD automation (for example Github actions/CircleCI etc.)

Findwork Copyright © 2021


Let's simplify your job search. Receive your tailored set of opportunities today.

Subscribe to our Jobs