We build world models that simulate manipulation scenes faithfully enough to validate, and one day, train policies without touching a robot. You'll develop generative models that make this work, with the controllability and physical fidelity to match real-robot behavior.
What you'll do:
-
Train video and dynamics models: Develop world models with action conditioning for manipulation policies.
-
Push long-horizon coherence: Develop architectures and training methods that extend rollout quality on hard physical tasks.
-
Own training infrastructure: Run multi-GPU clusters, write custom CUDA, debug at scale.
-
Build the world-model data engine: Design, implement, and improve a data engine that allows the world model to compound learning across customers and manipulation tasks.
Requirements:
- Very strong coding in Python and PyTorch (or similar).
-
Video generation experience: Deep experience training image or video generation models end-to-end.
-
Large-scale training: Track record operating training runs at cluster scale.
-
3D vision: Working knowledge of multi-view geometry, scene reconstruction, and physical priors.