Sr Software Development Engineer, EC2 Nitro Machine Learning Systems
Posted 30 days ago
Job Description
This job posting has expired and no longer accepting applications.
EC2 Nitro drives the planet’s largest, fastest growing and most feature-rich compute cloud. Nitro is AWS ground-up design for virtualization at global scale built on a fully custom stack of hardware, firmware and applications. Nitro has enabled EC2 to support Intel, AMD and Amazon’s custom silicon - the Graviton processor family - while raising the industry bar for security and performance across our product line.
We integrate hardware, firmware, application software and services to deliver new virtualized and bare-metal compute platforms for companies from startups through the Fortune 500. We are looking for an experienced leader to drive software development and scaling for new EC2 compute platforms. In this role, you will work with a broad and deep group of technical teams that develop hardware, firmware, systems and application software.
The ideal candidate is expected to have solid understanding of computer science fundamentals, and expertise in C,C++ or Rust development in a Linux environment. Experience with Linux package management, version control systems, automated build processes, and software unit testing are required. In-depth knowledge of ML frameworks and cluster management is highly preferred.
Key job responsibilities
- Design and develop innovative technologies that power the infrastructure supporting machine learning workloads
- Lead technical projects establishing EC2 as the definitive source for ML performance best practices across diverse applications including LLMs, multimodal systems, and emerging model architectures
- Develop and maintain comprehensive regression testing systems that validate performance across major component releases including frameworks, firmware, drivers, and networking infrastructure
- Collaborate with hardware engineering teams to influence future platform designs based on performance insights gathered from state-of-the-art research and customer workloads
- Build customer relationships by investigating complex performance challenges, developing solutions, and publishing actionable best practices through multiple channels
About the team
The EC2 Nitro Machine Learning Systems team is responsible for development, operations, and maintenance of scale-out machine learning platforms used for training and inference workloads. We build and optimize the infrastructure that powers some of the most computationally intensive AI/ML workloads in the cloud. Our team is passionate about creating reliable, high-performance systems that enable customers to push the boundaries of what's possible with machine learning.
Working with us means having the opportunity to influence the future of supercomputing in the cloud while solving complex technical challenges at massive scale. We collaborate closely with customers and internal teams to continuously improve our platforms and deliver innovations that accelerate machine learning workflows.
We integrate hardware, firmware, application software and services to deliver new virtualized and bare-metal compute platforms for companies from startups through the Fortune 500. We are looking for an experienced leader to drive software development and scaling for new EC2 compute platforms. In this role, you will work with a broad and deep group of technical teams that develop hardware, firmware, systems and application software.
The ideal candidate is expected to have solid understanding of computer science fundamentals, and expertise in C,C++ or Rust development in a Linux environment. Experience with Linux package management, version control systems, automated build processes, and software unit testing are required. In-depth knowledge of ML frameworks and cluster management is highly preferred.
Key job responsibilities
- Design and develop innovative technologies that power the infrastructure supporting machine learning workloads
- Lead technical projects establishing EC2 as the definitive source for ML performance best practices across diverse applications including LLMs, multimodal systems, and emerging model architectures
- Develop and maintain comprehensive regression testing systems that validate performance across major component releases including frameworks, firmware, drivers, and networking infrastructure
- Collaborate with hardware engineering teams to influence future platform designs based on performance insights gathered from state-of-the-art research and customer workloads
- Build customer relationships by investigating complex performance challenges, developing solutions, and publishing actionable best practices through multiple channels
About the team
The EC2 Nitro Machine Learning Systems team is responsible for development, operations, and maintenance of scale-out machine learning platforms used for training and inference workloads. We build and optimize the infrastructure that powers some of the most computationally intensive AI/ML workloads in the cloud. Our team is passionate about creating reliable, high-performance systems that enable customers to push the boundaries of what's possible with machine learning.
Working with us means having the opportunity to influence the future of supercomputing in the cloud while solving complex technical challenges at massive scale. We collaborate closely with customers and internal teams to continuously improve our platforms and deliver innovations that accelerate machine learning workflows.
This job posting has expired and no longer accepting applications. Please check out our latest AI jobs.
Amazon
126 jobs posted
About the job
Posted on
Mar 12, 2026
Apply before
Apr 11, 2026
Job typeFull-time
CategoryMachine Learning
Location
US, WA
Skills
Similar Jobs
29d
Sr. Software Development Engineer (GPU Machine Learning Performance)
AMD
MARKHAM, CanadaSr. Software Development Engineer (GPU Machine Learning Performance)
AMD
MARKHAM, Canada29d24d
Sr. Software Development Engineer (GPU Machine Learning Performance)
AMD
Austin, TexasSr. Software Development Engineer (GPU Machine Learning Performance)
AMD
Austin, Texas24d14d
Sr Machine Learning Engineer
project44
ChicagoSr Machine Learning Engineer
project44
Chicago14d2d
Sr Machine Learning Engineer
Workday
CA$156K - CA$234KCanada, ON, CanadaSr Machine Learning Engineer
Workday
CA$156K - CA$234KCanada, ON, Canada2d29d
Software Development Engineer (GPU Machine Learning Performance)
AMD
MARKHAM, CanadaSoftware Development Engineer (GPU Machine Learning Performance)
AMD
MARKHAM, Canada29d30d
Sr Associate Machine Learning Engineer
Workday
$119KUSA, COSr Associate Machine Learning Engineer
Workday
$119KUSA, CO30d25d
Software Engineer, Machine Learning Tooling
Waymo
NT$2.6M - NT$3.1MTaipei, TaiwanHsinchu, TaiwanSoftware Engineer, Machine Learning Tooling
Waymo
NT$2.6M - NT$3.1MTaipei, TaiwanHsinchu, Taiwan25d25d
Software Engineer, Machine Learning Performance
Waymo
NT$1.9M - NT$2.3MTaipei, TaiwanHsinchu, TaiwanSoftware Engineer, Machine Learning Performance
Waymo
NT$1.9M - NT$2.3MTaipei, TaiwanHsinchu, Taiwan25d23d
Software Engineer III, Machine Learning
Reddit
$223K - $260KSan Francisco, CASoftware Engineer III, Machine Learning
Reddit
$223K - $260KSan Francisco, CA23d23d
Staff Machine Learning Systems Engineer
Reddit
Remote$230K - $322KUnited StatesStaff Machine Learning Systems Engineer
Reddit
Remote$230K - $322KUnited States23d
Looking for something different?
Browse all AI jobsFree AI job alerts
Get the latest AI jobs delivered to your inbox every week. Free, no spam.