Hpc Engineer (X/F/M) – Dach // Meshcapade

Meshcapade | Berlin, Germany

Date listed

2 months ago

Employment Type

Full time

Who are we? Meshcapade is the 3D digital human company. We are creating realistic human avatars for use in apparel, games, fitness, AI, and augmented reality. Using machine learning and computer vision, we model the nuances of human body shape and movement. We automatically convert photos, 3D & 4D scans, RGB-D sequences, Mocap and even words into realistic 3D humans. We are a spin-off from the Max Planck Institute for Intelligent Systems in Tübingen, Germany and our products are powered by state of the art, patented research. Our core product, Meshcapade Me, is an online platform for the creation, animation, and use of 3D digital humans. Our clients run the gamut of global names; a broad mix of tech, media, health and fitness, apparel, and education. We are seeking a skilled and experienced High-Performance Computing (HPC) Engineer. You will play a crucial role in developing and maintaining our GPU HPC systems that power our cutting-edge human-centric AI research and products. You will be responsible for designing, implementing, and managing the infrastructure that supports our machine learning and scientific computing workloads. Your day-to-day tasks will include building robust and scalable infrastructure, deploying and managing HPC resources, and automating operational processes. You will apply your deep understanding of MLOps principles and HPC systems to solve complex computational challenges, combining local and cloud-based computing resources. This means you'll be actively involved in executing high-level computational strategies, tracking crucial processing information, and ensuring high data integrity. If you have a passion for pushing the boundaries of computational capabilities and want to contribute to groundbreaking innovations in human-centric machine learning, this is the opportunity for you. What you will be doing:

  • Design, deploy, and maintain GPU cluster, optimized for large-scale machine learning model training.
  • Collaborate with ML scientists, and software engineers to understand computational requirements.
  • Monitor and troubleshoot GPU cluster, addressing performance bottlenecks and ensuring reliability.
  • Evaluate and recommend local and cloud-based solutions to enhance computational capabilities.
  • Implement and manage job scheduling systems to optimize GPU resource utilization.
  • Collaborate with vendors for local and cloud-based GPU resources procurement and optimization.
  • Provide training and support to users on GPU computing best practices and utilization.
  • Stay current with industry trends and emerging technologies in GPU-accelerated computing and machine learning.
  • Contribute to documentation and knowledge sharing within the team.
Who you are:
  • Bachelor's or Master's degree in computer science, electrical engineering or related field.
  • Proven experience (5+ years) as an HPC Engineer with a focus on GPU clusters and Machine Learning.
  • Experience with HPC cluster manager & job scheduling software (e.g. HTCondor, Slurm, PBS, etc)
  • Administration experience with Linux OS (e.g. SLES/RHEL/CentOS/Ubuntu etc.).
  • Good knowledge of the scripting language Bash and/or Python.
  • Experience with Parallel file systems like GPFS/Lustre/Ceph/BeeGFS/Weka.
  • HPC system troubleshooting and support
  • Experience with network technologies (NVLink, Infiniband)
  • 3+ years of hands-on experience with cloud-based GPU computing (GCP, AWS)
  • Some experience with automation tools for configuration management (e.g. Ansible, Puppet, Chef) and revision control systems (e.g. Git)
  • Experience with containers (Kubernetes, Docker)
  • Familiarity with deep learning frameworks (e.g. PyTorch, TensorFlow) and their GPU optimizations.
  • Deep understanding of storage systems and high-speed interconnects commonly used in HPC environments (local and cloud-based).
  • Ability to work under pressure in a fast-paced environment
  • Strong attention to detail with an analytical mind and outstanding problem-solving skills
  • Great communication skills and an ability to develop effective relationships in a small team
  • Bilingual German and English is essential
Talent Acquisition Process:
  • Interview with our Talent Acquisition team;
  • Interview with the Engineering team members;
  • Technical Assessment;
  • Debrief and interview with stakeholders
What we offer:
  • A competitive compensation package;
  • Full remote working support;
  • An entrepreneurial team passionate about creating the technology to power the world's avatars;
  • Opportunity to work with an internationally diverse team;
  • Great perks (autonomy, flexible working hours, wellness budget, hardware budget, co-working space allowance and team events).
Diversity isn’t just a statement at Meshcapade, it sits at the core of the company. We believe in the diversity of thought because we appreciate that this makes us stronger. Therefore, we encourage applications from everyone who can offer their unique experience to our collective achievements.

Findwork Copyright © 2023

Newsletter


Let's simplify your job search. Receive your tailored set of opportunities today.

Subscribe to our Jobs