Sr. Software Engineer, EFA Network ML Software Team - Annapurna Labs
Posted 1 day ago
Job Description
Want to help make the next generation of Machine Learning in the cloud possible? Do you have a laser focus on performance in your team's code? We want to talk to you!
We own the user-space software that makes the Elastic Fabric Adapter (EFA) network card work for Machine Learning (ML) and High-Performance Computing (HPC) customers on AWS. Across multiple projects written in C, our team enables customers to network thousands of GPU and CPU instance types to handle the toughest clustered workloads. Lead a dynamic, fast-paced group that has a big impact every day on the hottest companies doing AI and HPC today.
Key job responsibilities
You will help lead a team of obsessed networking developers operating at the highest levels in networking. You will write the highest-performing code in C for multiple open source projects supporting EFA, such as Libfabric and Open MPI. You will work with multiple teams in the stack to invent new APIs for the latest concepts in networking in the cloud. Dive deep into how your customers are doing collectives and messaging at high bandwidth and low latency. Provide expert-level support to some of the biggest names in AI in the world.
A day in the life
Start from the needs of your customer and invent new ways of cutting the occupancy of the software stack for EFA. Drive your peers and leadership to accept your excellent written designs. Work with our ML Infrastructure team to see your products perform on 100s and 1000s of top-end machine clusters.
About the team
We are a fast-paced team that owns the user-space software stack for EFA. As part of Annapurna Labs in AWS we are very nimble, paying careful attention to what the AI industry is going to try next, and having our products ready. We focus heavily on automation, confining operations to the most interesting problems as customers continuously experiment with what our network can do. Our team is a place of growth, concentrating on your career and goals and motivating you to achieve your highest potential.
We own the user-space software that makes the Elastic Fabric Adapter (EFA) network card work for Machine Learning (ML) and High-Performance Computing (HPC) customers on AWS. Across multiple projects written in C, our team enables customers to network thousands of GPU and CPU instance types to handle the toughest clustered workloads. Lead a dynamic, fast-paced group that has a big impact every day on the hottest companies doing AI and HPC today.
Key job responsibilities
You will help lead a team of obsessed networking developers operating at the highest levels in networking. You will write the highest-performing code in C for multiple open source projects supporting EFA, such as Libfabric and Open MPI. You will work with multiple teams in the stack to invent new APIs for the latest concepts in networking in the cloud. Dive deep into how your customers are doing collectives and messaging at high bandwidth and low latency. Provide expert-level support to some of the biggest names in AI in the world.
A day in the life
Start from the needs of your customer and invent new ways of cutting the occupancy of the software stack for EFA. Drive your peers and leadership to accept your excellent written designs. Work with our ML Infrastructure team to see your products perform on 100s and 1000s of top-end machine clusters.
About the team
We are a fast-paced team that owns the user-space software stack for EFA. As part of Annapurna Labs in AWS we are very nimble, paying careful attention to what the AI industry is going to try next, and having our products ready. We focus heavily on automation, confining operations to the most interesting problems as customers continuously experiment with what our network can do. Our team is a place of growth, concentrating on your career and goals and motivating you to achieve your highest potential.
Amazon
136 jobs posted
About the job
Similar Jobs
18d
ML Kernel Performance Engineer, AWS Neuron, Annapurna Labs
Amazon
US, CAML Kernel Performance Engineer, AWS Neuron, Annapurna Labs
Amazon
US, CA18d4d
ML Compiler Engineer II - Neuron Kernel Interface , Annapurna Labs
Amazon
US, NYML Compiler Engineer II - Neuron Kernel Interface , Annapurna Labs
Amazon
US, NY4d22d
Software Engineering LMTS- ML Engineer
Salesforce
IndiaSoftware Engineering LMTS- ML Engineer
Salesforce
India22d12d
Senior Software Engineer, ML Infra
Roblox
$197K - $243KSan Mateo, CASenior Software Engineer, ML Infra
Roblox
$197K - $243KSan Mateo, CA12d12d
Principal Software ML Test Engineer
d-Matrix
$180K - $300KSanta ClaraPrincipal Software ML Test Engineer
d-Matrix
$180K - $300KSanta Clara12d11d
Sr. Engineer, Software - AI Compiler
Tenstorrent
Belgrade, RSSr. Engineer, Software - AI Compiler
Tenstorrent
Belgrade, RS11d6d
Software Engineer, ML Platform Infrastructure
Nuro
$160KHQSoftware Engineer, ML Platform Infrastructure
Nuro
$160KHQ6d4d
Sr. Software Engineer, AI Compiler
Tenstorrent
$100K - $500KToronto, Ontario, CanadaSr. Software Engineer, AI Compiler
Tenstorrent
$100K - $500KToronto, Ontario, Canada4d1d
Software Engineering LMTS- ML Engineer
Salesforce
IndiaSoftware Engineering LMTS- ML Engineer
Salesforce
India1d26d
Sr Software Development Engineer for AI
Workday
CA$140K - CA$210KCanada, BC, CanadaSr Software Development Engineer for AI
Workday
CA$140K - CA$210KCanada, BC, Canada26d
Looking for something different?
Browse all AI jobsFree AI job alerts
Get the latest AI jobs delivered to your inbox every week. Free, no spam.