Site Reliability Engineering Manager - Amsterdam, Ghent, Lodz, Berlin

Tomtom | Amsterdam

Date listed

2 months ago

Employment Type

Full time

At TomTom…
You’ll move the world forward.Every day, we create the most innovative mapping and location technologies to shape tomorrow’s mobility for the better.

We are proud to be one team of more than 5,000 unique, curious, passionate problem-solvers spread across the world.We bring out the best in each other. And together, we help the automotive industry, businesses, developers, drivers, citizens and cities move towards a safe, autonomous world that is free of congestion and emissions.

As a Site Reliability Engineering Manager, you will manage your team and partner with internal & external engineering teams to build reliable infrastructure, with automation and tooling to support it. You will have the chance to work through challenging scaling issues, dig in deep to debug the hardest technical problems, work across the stack solving problems and drive unmatched reliability of our systems. You will be the conduit between your SRE team and the business.

What you’ll do

  • Lead a team of Site Reliability and Observability Engineers that provide service, knowledge, guidance and support to larger TomTom engineering

  • Work with partners to shape the architecture, design, and implementation of new and existing systems to enhance their observability, reliability, efficiency, and scalability.

  • Lead the definition and implementation of the SRE strategy and roadmap.

  • Provide the Incident Commander role in high priority incidents, getting hands-on when required to improve theTTR.

  • Apply resiliency engineering discipline to avoid reoccurrence of incidents, drive incident response, analysis and remediation.

  • Ensure that critical services have a properly configured monitoring and alerting setup and that operational hygiene is applied to guarantee their continuity.

  • Provide observability expertise and enablement to the engineering teams design, write and maintain software to improve the performance of services and the connected operational profile.

  • Be part of a collaborative environment, working together across many teams to ensure that systems are performing as well as possible.

What you’ll need

  • You have proven experience with people management, talent building and engineering teams development

  • You have proven experience of leading technology and process-oriented engineering teams, setting goals and achieving them

  • You have proven experience developing, operating, and troubleshooting large scale production systems running on public cloud Infrastructure (eg.AWS, Azure)

  • You have experience working with and deep knowledge of operating container orchestration solutions (Kubernetes)

  • You have deep understanding of observability concepts and experience in technologies, tools and services related to its pillars

  • You have strong scripting and programming skills

  • You have experience working with concepts of infrastructure-as-a-code, configuration management, and related technologies

  • You have experience with Incident Management process, debriefing of incidents, and deriving corrective and preventive actions

  • You have good understanding of Unix/Linux systems internals

  • You have good understanding of the SLIs,SLOs, and SLAs concepts

  • You are passionate about automation and eliminating toil

We consider favourably

  • Experience in leading SRE team

  • Experience mentoring, sharing knowledge and acting as a development advocate

  • Experience with tools as Scalyr, Grafana Cloud, Prometheus, OpenTelemetry

  • Experience with Java, Spring Boot and JVM

  • Expert level certs for AWS, Azure

Meet your team
Our team is in the core TomTom live services. We connect with all DevOps teams and make sure that there is a good as possible customer experience when there is an incident and minimize the MTTR (Mean time to resolve). We also focus on reducing the number of incidents as we participate in improvement actions with a focus on automation and reliability setup.

Our Site Reliability Engineers (SRE) are a hybrid of software and systems engineers. We code our way out of operational problems. We are responsible for reliability, scalability, and automation while keeping an eye on latency, performance and capacity as well as other KPI’s.

Achieve more
We are self-starters who play well with others. Every day, we solve new problems with creativity, meet new people and learn rapidly at our offices around the world. We will invest in your growth and are committed to supporting you. In everything we do, we’re guided by six values: We care, putting our heart into what we do; we build trust (you can count on us); we create – driven to make a difference; we are confident, but don’t boast; we keep it simple, since life is complex enough; and we have fun because life’s too short to be boring.

After you apply
Our recruitment team will work hard to give you a meaningful experience throughout the process, no matter the outcome. Your application will be screened closely and you can rest assured that all follow-up actions will be thorough, from assessments and interviews through your onboarding.

TomTom is an equal opportunity employer
We celebrate diversity, thrive on each other’s differences and are committed to creating an inclusive environment at our offices around the world. Naturally, we do not discriminate against any employee or job applicant because of race, religion, color, sexual orientation, gender, gender identity or expression, marital status, disability, national origin, genetics, or age.

Ready to move the world forward?

Findwork Copyright © 2021


Let's simplify your job search. Receive your tailored set of opportunities today.

Subscribe to our Jobs