Software Engineer I - AI/ML, AWS Neuron Distributed Training
Posted 1 day ago
Job Description
Annapurna Labs designs silicon and software that accelerates innovation. Our custom chips, accelerators, and software stacks enable us to tackle unprecedented technical challenges and deliver solutions that help customers change the world. AWS Neuron is the complete software stack powering AWS Trainium (Trn2/Trn3), our cloud scale Machine Learning accelerators and we are seeking a Senior Software Engineer to join our ML Distributed Training team.
In this role, you will be responsible for the development, enablement, and performance optimization of large scale ML model training across diverse model families. This includes massive scale pre-training and post-training of LLMs with Dense and Mixture-of-Experts architectures, Multimodal models that are transformer and diffusion based, and Reinforcement Learning workloads. You will work at the intersection of ML research and high performance systems, collaborating closely with chip architects, compiler engineers, runtime engineers and AWS solution architects to deliver cost-effective, performant machine learning solutions on AWS Trainium based systems.
Key job responsibilities
You will contribute to the design and implementation of distributed training solutions for large-scale ML models running on Trainium instances. A significant part of your work will involve extending and optimizing popular distributed training frameworks including FSDP, torchtitan, and Hugging Face libraries for the Neuron ecosystem.
A core focus of this role involves developing and optimizing mixed-precision and low-precision training techniques. You will work with BF16, FP8, and emerging numerical formats to improve training throughput while maintaining model accuracy and convergence quality. This includes implementing precision-aware training strategies, loss scaling techniques, and careful gradient management to ensure training stability across reduced precision formats.
Beyond precision optimization, you will profile, analyze, and tune end-to-end training pipelines to achieve optimal performance on Trainium hardware. You will partner with hardware, compiler, and runtime teams to understand system constraints and unlock new capabilities. Additionally, you will collaborate with AWS solution architects and customers to support the deployment and optimization of training workloads at scale.
About the team
Annapurna Labs was a startup company acquired by AWS in 2015, and is now fully integrated. If AWS is an infrastructure company, then think Annapurna Labs as the infrastructure provider of AWS. Our org covers multiple disciplines including silicon engineering, hardware design and verification, software, and operations. AWS Nitro, ENA, EFA, Graviton and F1 EC2 Instances, AWS Neuron, Inferentia and Trainium ML Accelerators, and in storage with scalable NVMe, are some of the products we have delivered, over the last few years.
In this role, you will be responsible for the development, enablement, and performance optimization of large scale ML model training across diverse model families. This includes massive scale pre-training and post-training of LLMs with Dense and Mixture-of-Experts architectures, Multimodal models that are transformer and diffusion based, and Reinforcement Learning workloads. You will work at the intersection of ML research and high performance systems, collaborating closely with chip architects, compiler engineers, runtime engineers and AWS solution architects to deliver cost-effective, performant machine learning solutions on AWS Trainium based systems.
Key job responsibilities
You will contribute to the design and implementation of distributed training solutions for large-scale ML models running on Trainium instances. A significant part of your work will involve extending and optimizing popular distributed training frameworks including FSDP, torchtitan, and Hugging Face libraries for the Neuron ecosystem.
A core focus of this role involves developing and optimizing mixed-precision and low-precision training techniques. You will work with BF16, FP8, and emerging numerical formats to improve training throughput while maintaining model accuracy and convergence quality. This includes implementing precision-aware training strategies, loss scaling techniques, and careful gradient management to ensure training stability across reduced precision formats.
Beyond precision optimization, you will profile, analyze, and tune end-to-end training pipelines to achieve optimal performance on Trainium hardware. You will partner with hardware, compiler, and runtime teams to understand system constraints and unlock new capabilities. Additionally, you will collaborate with AWS solution architects and customers to support the deployment and optimization of training workloads at scale.
About the team
Annapurna Labs was a startup company acquired by AWS in 2015, and is now fully integrated. If AWS is an infrastructure company, then think Annapurna Labs as the infrastructure provider of AWS. Our org covers multiple disciplines including silicon engineering, hardware design and verification, software, and operations. AWS Nitro, ENA, EFA, Graviton and F1 EC2 Instances, AWS Neuron, Inferentia and Trainium ML Accelerators, and in storage with scalable NVMe, are some of the products we have delivered, over the last few years.
Apply for this position
Please mention that you found this job on MoAIJobs, this helps us grow. Thank you!
Amazon
182 jobs posted
About the job
Posted on
Jun 2, 2026
Apply before
Jul 2, 2026
Job typeFull-time
Location
US, CA
Similar Jobs
- 28d
Senior Software Engineer, ML Training Platform
Reddit
$217K - $303KSan Francisco, CA - 11d
Software Engineer, ML Systems & Training Architecture
OpenAI
$295K - $380KSan Francisco, CA - 11d
Software Engineer, ML Systems & Training Architecture
OpenAI
$295K - $380KSan Francisco, CA - 25d
Senior / Staff ML Training Optimization Engineer
Waabi
Remote$141K - $249KDallas, TXPhoenix, AZPittsburgh, PASan Francisco, CAToronto, ON, CanadaRemote US & Canada - 21d
Technical Marketing Engineer – AI Training Workloads & Performance
AMD
Santa Clara, California - Today
Research Engineer, Pre-Training
Jump Trading
$300K - $350KUnited States - 1d
AI Technical Training Content Developer
Nasdaq
€3K - €4KVilnius - 13d
Machine Learning Engineer (Training Optimization)
Canva
Beijing, Beijing, China - 12d
Sr. Analyst - AI Training Program Coordinator
Nasdaq
$79K - $138KUnited States - 7d
Applied Scientist, RL post-training, AWS
Amazon
US, WA