Tens of thousands of companies use Metabase every day to answer questions about their data. While we seek to become the de-facto self-managed open source analytics software for organizations everywhere, many customers want an ability to use Metabase without worrying about the operational details of self-hosting. That’s why we recently launched our Metabase Cloud product. We’re looking for operations engineers to help build out and run our new and quickly growing ‘Metabase Cloud’ hosted product.
You will:
- Own and operate our application stack and AWS infrastructure to orchestrate and manage our hosted customer instances of Metabase
- Debug runtime issues across the different levels of our application stack, AWS, and the Metabase product
- Build out and improve our observability infrastructure
- Develop and build our internal tooling and automation to manage the lifecycle of a hosted Metabase installation, from purchase to deployment, zero-downtime upgrades, and general operational health
- Continuously improve our automated deployments and testing
We’re looking for someone who:
- Is thoughtful and careful
- Compulsively automates everything!
- Has strong network security and application security skills
- Can write high quality and readable code in a modern language (e.g. Clojure, Scala, Python, Go, etc.)
- Thinks about systems, edge conditions, and their failure modes
- Is able to make solid technical judgements and back them up articulately
- Has past experience building and operating production infrastructure on AWS, in one or more of the following areas: ECS with Docker, Autoscaling and Load Balancing, CloudWatch, Aurora and RDS, S3, Secrets Manager, and/or CloudFormation and Beanstalk.
- Has experience managing production systems with ‘infrastructure as code’ software tools like Terraform
- Has experience running JVM applications under a variety of loads with monitoring, observability, and memory & performance tuning in mind
- Has previous knowledge and experience in Clojure or similar functional language, or a desire to learn
Projects you could work on:
- Simplifying and unifying the infrastructure between our single and multi-tenant hosting platforms, and migrating existing customers
- Build out our observability and monitoring system to unify system, jvm, and application level metrics
- Collaborate with core application developers on making changes to achieve full zero-downtime deployments and upgrades of our hosted customers
- Supporting multiple availability zones and other cloud providers
- Work to improve our automation of system and processes towards achieving compliance certifications such as SOC2