Machine Learning Engineer (Training Optimization)
Posted 22 days ago
Job Description
Company Description
About the Group/Team
We're the CORE team within the Generative AI supergroup. Our mission is to invent foundational technologies that will power the future of AI-assisted design. From large-scale models to groundbreaking research, our team builds the technical core of Canva’s creative intelligence engine. We collaborate globally to ship research that makes a real impact—from smart editing to AI video tools—at massive scale.
Job Description
About the Role/Specialty
As a Machine Learning Engineer, you’ll lead efforts to scale and optimize the training system for our large-scale multimodal and foundation models. You’ll design distributed training systems using Megatron-LM, NVIDIA NeMo, FSDP, and Triton—pushing the limits of performance across compute, memory, and communication layers. You'll sit at the intersection of systems and AI research, directly shaping how we train the models that will power Canva’s next generation of products.
What you’ll do (responsibilities)
- You’ll design, implement, and optimize large-scale machine learning systems for training
- You’ll improve all aspects of performance, including GPU utilization, communication overhead, and memory efficiency.
- You’ll partner with research and modeling teams to align systems with algorithmic needs.
- You’ll evaluate and apply best practices for distributed training using industry-leading frameworks.
- You’ll dive deep into low-level optimization, including custom CUDA or Triton kernels.
- You’ll debug, profile, and fine-tune training workflows to unlock new levels of scalability.
Qualifications
What we're looking for
We’re looking for a systems-first engineer who thrives in fast-paced, high-impact environments. You’re deeply familiar with distributed model training at scale and understand the nuances of optimizing compute at every level of the stack. You're excited by challenges that stretch current boundaries, and you’re a strong collaborator who communicates clearly across domains.
- Strong background in LLMs, multimodal AI, or diffusion models.
- Proficiency in Python. Familiarity with a system programming language (e.g. C++ or Rust) is a plus.
- Deep knowledge of PyTorch or JAX as well as libraries such as Megatron-LM, NeMo, or DeepSpeed.
- Familiarity with common optimization techniques such as FSDP/ZeRO, gradient checkpointing, or low-precision data types.
- Hands-on experience writing custom GPU kernels in CUDA or Triton.
- Excellent communication and problem-solving skills, incl. full proficiency in English.
Additional Information
大模型训练优化工程师(多模态/图像生成),技术要求:算子优化/分布式训练/GPU集群/训练框架。该岗位面向所有经验阶段的候选人开放,包括社会招聘、2026年及2027年应届毕业生,同时开放实习生岗位。
Canva
24 jobs posted
About the job
Similar Jobs
23d
Machine Learning Engineer
Observe AI
BengaluruMachine Learning Engineer
Observe AI
Bengaluru23d
23dSenior Machine Learning Engineer
Censys
Remote$171K - $203KUnited States
Senior Machine Learning Engineer
Censys
Remote$171K - $203KUnited States23d29d
Machine Learning Engineer (Staff & Principal)
Tubi
$239K - $342KSan Francisco, CALos Angeles, CANew York, NYMachine Learning Engineer (Staff & Principal)
Tubi
$239K - $342KSan Francisco, CALos Angeles, CANew York, NY29d28d
Machine Learning Engineer
Reddit
$217K - $260KSan Francisco, CAMachine Learning Engineer
Reddit
$217K - $260KSan Francisco, CA28d28d
Machine Learning Engineer
Reddit
$260K - $303KNew York City, NYMachine Learning Engineer
Reddit
$260K - $303KNew York City, NY28d27d
Senior Machine Learning Engineer, Payments
Airbnb
Remote$191K - $223KUnited StatesSenior Machine Learning Engineer, Payments
Airbnb
Remote$191K - $223KUnited States27d25d
Machine Learning Engineer
Motorola Solutions
$120K - $160KLos Angeles, CAMachine Learning Engineer
Motorola Solutions
$120K - $160KLos Angeles, CA25d22d
Machine Learning Engineer (LLM / Personalization)
Qloo
United StatesMachine Learning Engineer (LLM / Personalization)
Qloo
United States22d21d
Machine Learning Engineer - Training & Dataset Platform (AU remote)
Canva
Sydney, AustraliaMachine Learning Engineer - Training & Dataset Platform (AU remote)
Canva
Sydney, Australia21d20d
Machine Learning Engineer
Reddit
RemoteRemote - Ontario, CanadaMachine Learning Engineer
Reddit
RemoteRemote - Ontario, Canada20d
AI jobs in your inbox
Get the latest AI jobs delivered to your inbox every week. Free, no spam.