Job Description
About Luma AI:
Luma’s mission is to build multimodal AGI. Through our research on video, 3D, and now multimodal models at Luma, we believe that AI needs to be jointly trained over all signal modalities – text, video, audio, images – analogous to the human brain.
To advance our mission, we build and operate the full stack end-to-end, spanning foundation models, inference systems, and products. This integrated approach powers technologies like Ray3, which is seeing rapidly growing adoption among Fortune 500 companies across media, entertainment, and advertising. Backed by a recent $900M Series C and our partnership with Humain to build a 2 GW compute supercluster (Project Halo), our models and the Dream Machine platform are now enabling creatives worldwide to tell some of the most impactful stories of our time.
Where You Come In:
This is a rare and foundational opportunity to define the future of creative AI. You will be at the forefront of building and training large-scale multimodal generative models, directly impacting how users create and interact with video and audio. This role offers the chance to bridge cutting-edge research with magical, shipped products, working end-to-end on novel problems with no existing playbook.
What You'll Do:
This opportunity involves both the “science” and “engineering” parts of research, so feel free to choose the title you think is appropriate for you (Research Scientist or Research Engineer).
This is a multi-stack opportunity where you will work on the intersection of modeling, data, systems, and evaluation.
- Modeling: Architect large-scale video and audio generative models, focusing on strong temporal coherence and high perceptual quality.
- Data: Design, implement, and run robust data pipelines for curating, filtering, and captioning massive video and audio datasets.
- Systems: Train large-scale video and audio generative models on massive datasets and GPU clusters.
- Evaluation: Define and build novel evaluation frameworks to measure realism, temporal consistency, controllability, and human-aligned creative quality.
Who You Are:
- Strong foundation in machine learning and generative modeling, with experience in video, audio, or multimodal domains.
- Deep understanding of autoregressive, diffusion/flow-based, or hybrid generative models, and their tradeoffs for long-horizon generation.
- Hands-on experience with PyTorch and large-scale training (distributed, mixed precision, large datasets).
What Sets You Apart (Bonus Points):
Experience in the following around data, modeling, or evaluation:
- Text-to-video/audio models
- Vision language models
- Audio language models
Your application are reviewed by real people.
Luma AI
6 jobs posted
About the job
Posted on
Feb 5, 2026
Apply before
Mar 7, 2026
Job typeFull-time
CategoryResearch Scientist
Location
Palo Alto, CA
Similar Jobs
Anthropic
19 days agoResearch Engineer/Research Scientist, Audio
RemoteSan Francisco, CASeattle, WANew York City, NY$350K - $500K/yrView detailsAnthropic
19 days agoResearch Engineer / Research Scientist, Vision
New York City, NYSan Francisco, CASeattle, WA$350K - $850K/yrView detailsDeepMind
23 days agoResearch Scientist/Research Engineer, Multimodal Agents
Mountain View, California, USView detailsAnthropic
21 days agoResearch Engineer / Research Scientist, Pre-training
SwitzerlandView detailsAnthropic
21 days agoResearch Engineer / Research Scientist, Tokens
New York City, NYSeattle, WASan Francisco, CA$340K - $425K/yrView detailsDeepMind
23 days agoResearch Scientist/Research Engineer, Multimodal Agents
Mountain View, California, USView detailsAnthropic
8 days agoResearch Engineer / Scientist, Societal Impacts
San Francisco, CA$350K - $500K/yrView detailsCanva
1 day agoSenior Research Engineer - Design Generation
AustriaView detailsCanva
1 day agoSenior Research Engineer - Design Generation
United KingdomView detailsLuma AI
8 hours agoResearch Scientist / Engineer — Voice Agents
Palo Alto, CAView details
Looking for something different?
Browse all AI jobs