Site Reliability Engineer (Remote U.S. - Eastern Or Eu)

Authzed

Date listed

2 months ago

Employment Type

Full time

Remote

Yes

Found on:

YCombinator Startups

Keywords: remote docker sql nodejs kubernetes grafana github python prometheus terraform ruby

Job Summary:

We are seeking a Site Reliability Engineer to join our tech startup in the infrastructure and authorization space. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, availability, and performance of our systems. You will be responsible for designing, implementing, and maintaining scalable infrastructure solutions to support our growing customer base. This is an exciting opportunity to work in a fast-paced environment and contribute to the success of a company bringing a Google-inspired authorization system to companies around the globe.

Responsibilities:

Design, implement, and maintain highly available and scalable infrastructure solutions for our projects, products, and customers.
Monitor and analyze system performance, identifying and resolving bottlenecks and issues to ensure optimal performance and reliability.
Automate infrastructure deployment and configuration management processes.
Continuously improve system reliability, security, and efficiency through proactive monitoring, capacity planning, and performance tuning.
Troubleshoot and resolve complex infrastructure and application issues in production and test environments.
Collaborate with software engineering teams to design and implement systems that are resilient, scalable, and secure.
Participate in on-call rotation and respond to production incidents in a timely manner.
Document system configurations, troubleshooting procedures, and operational guidelines.

Requirements:

Proven experience as a Site Reliability Engineer or in a similar role.
Strong understanding of networking, operating systems, and cloud infrastructure.
Experience with Site Reliability Engineering, System Design, and Distributed Computing.
Experience in various programming languages — we currently have SDKs for NodeJS, Java, Python, Ruby, and Go.
Experience with containerization technologies such as Docker and Kubernetes.
Knowledge of infrastructure-as-code tools like Terraform and Pulumi.
Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
Experience with lower-level implementation details of relational databases (bonus if you have have experience with distributed SQL databased like Google Cloud Spanner or CockroachDB).
Experience working with Git and GitHub.
Experience with continuous integration and deployment systems.
Strong problem-solving and troubleshooting skills.
Excellent communication and collaboration abilities.