Wayfair operates a growing 24/7/365 Production Operations Center, and we are looking for talented DevOps Engineers with great communication skills to join our team. We are a proactive group of engineers looking for candidates who are willing to do more than just respond to issues. We thrive on working smart not hard, and always look for opportunities to automate as much of our operations as possible. We make our own alerts, find anomalies, fix issues and ask why things happen.
What You’ll Do:
- You do your best even if no one is watching
- Proactively solve problems and prevent them from happening again
- Respond to issues of all sizes, from major outages to minor alerts, and fix or escalate as needed.
- Be able to identify personal KPIs and Goals that help the organization and be able to execute on them.
- Work with subject matter experts to learn new skills and share with the team.
- Create and tweak alerts to improve our monitoring.
- Learn some of the concepts of Incident coordination and running with large scale issues the company might face.
- Write and refine our troubleshooting guides and documentation.
What You’ll Need:
- Experience in DevOps or a responsive engineering role
- Strong background in Linux and/or Windows administration
First hand experience with:
- Monitoring platforms: DataDog, Zabbix, or similar
- Logging platforms: ElasticSearch, Logstash, and Kibana (ELK stack)
- Time series data platforms: Graphite, InfluxDB, Grafana
- Caching technology: Redis, Memcached
- Configuration management tools: Puppet, Consul, SaltStack
- Ticketing systems: ServiceNow, Jira, or similar
- Knowledge of CI/CD pipelines and version control software and concepts (preferably Git)
- Strong understanding of HTTP and TCP/IP
- Excellent communication skills including ability to translate between “Engineering” and “Business”