Our Engineers enjoy reducing or completely eliminating manual tasks, are excellent problem solvers, and know automation is key to operating a large-scale system.
SREs make sure that our applications highly available and Service Level Objectives (SLO) are met, SREs work directly on scrum teams with our Software Engineers using their interest in operations and development skills to ensure new features follow SRE best practices and are supportable.
- Monitor and resolve issues in all environments. Ensure SLO and uptime are met
- Be a member of scrum team(s) to ensure SRE concerns are addressed from the time a feature is designed through its deployment to production
- Engage in capacity planning and demand forecasting, anticipating performance bottlenecks and scaling the environment as needed
- Change management
- Uptime and SLO reporting
- Expert-level knowledge of at least one configuration management system. (Chef, Ansible, Puppet, etc.)
- Experience with build automation and configuration management tools (e.g. Puppet, Chef, Ansible.)
- Strong knowledge of the Amazon Web Services (AWS) ecosystem and other core AWS technologies
- Software development background preferred
- Experience with a microservice architecture running in containers (Docker or other containerization technology)
- Experience with Docker tooling and ecosystem
- Experience supporting 24x7, high availability internet application environments that include web, application and database servers and load balancing systems.
- Bachelor's degree or higher in Computer Science, or equivalent experience
- AWS Certification a plus
- Excellent written and communication skills