NVIDIA
Company
Senior System Software Engineer - AI Data Platform - Inference Factory Optimization
Job Description
Our team is building the foundational infrastructure that powers NVIDIA's cutting-edge innovations in AI and high-performance computing. We are seeking a Senior Software Engineer to design, build, and optimize highly scalable and reliable automation systems that ensure the peak performance and seamless deployment of our core software offerings across a diverse ecosystem. This is an opportunity to directly impact how AI models and complex applications are validated, tuned, and delivered globally, from cloud environments to on-premises data centers and specialized hardware.
What you’ll be doing
Develop efficient infrastructure and tools for automating complex software processes.
Drive Performance Optimization: Implement advanced test harnesses, benchmarking frameworks, and analytical tools to rigorously characterize and optimize the performance and efficiency of our software and hardware platforms.
Apply deep knowledge of operating systems, kernel internals, device drivers, memory management, storage, networking, and high-speed interconnects to build and troubleshoot highly performant systems.
Work with engineering teams to understand needs, define requirements, and deliver efficient solutions.
Set performance goals, monitor feedback, analyze data, and make continuous improvements for system reliability.
Influence Technical Strategy: Contribute to defining technical strategies and roadmaps for our platform automation initiatives, ensuring alignment with company-wide goals and standard methodologies.
What we need to see
Bachelor's or equivalent experience in Computer Science, Computer Engineering, or a related technical field, or Master's degree or equivalent experience in a similar field.
5+ years of industry experience in software development, focusing on infrastructure, distributed systems, automation, and/or performance engineering.
Expertise in System-Level Programming: Proven ability to develop robust tools and automation using programming languages such as C++, Python, or Go.
Deep Understanding of System Software: Experience with operating system internals, device drivers, memory management, and debugging performance issues in complex compute applications.
Distributed Systems: Experience in designing, building, and operating large-scale distributed systems, with knowledge of networking protocols, cluster management, and high-performance interconnects.
Automation and CI/CD Proficiency: Experience building and maintaining automated testing, benchmarking, and continuous integration/continuous deployment pipelines.
Problem-Solving and Analytical Skills: Outstanding analytical, problem-solving, and debugging skills, with a track record of resolving complex technical challenges.
Collaboration and Communication: Excellent interpersonal and communication skills, with the ability to articulate complex technical concepts to diverse audiences and collaborate effectively across teams.
Ways to stand out from the crowd
Experience optimizing performance for AI/Machine Learning workloads, especially inference applications, on diverse hardware platforms.
Prior experience building or contributing to large-scale compute infrastructure solutions in cloud environments or on-premises data centers.
Experience with containerization and orchestration technologies, such as Docker and Kubernetes.
Familiarity with performance profiling tools and methodologies for hardware and software systems.
Track record of driving significant efficiency gains or architectural improvements in large-scale systems.
Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
NVIDIA
28 jobs posted
About the job
Similar Jobs
Discover more opportunities that match your interests
- 5 days ago
Senior Software Engineer, Platform - Data + AI (Back-End)
C3 AI
Redwood City, CA$145K - $187KView details - 29 days ago
Software Senior Engineer - AI Infra and Platform Engineer
Dell Technologies
Singapore, SingaporeView details - 29 days ago
Software Senior Engineer - AI Infra and Platform Engineer
Dell Technologies
Singapore, SingaporeView details - 5 days ago
Senior Software Engineer, AI Platform and Enablement
Descript
San Francisco, CA$180K - $286KView details - 30 days ago
Senior AI Software Engineer
Salesforce
California - Palo AltoView details - 26 days ago
Senior AI Software Engineer
Salesforce
California - Palo AltoView details - 24 days ago
Senior Data Engineer, Data Platform
Otter
Mountain View, CAView details - 14 days ago
Senior AI Software Engineer
AMD
Shanghai, ChinaView details - 26 days ago
Senior AI Software Engineer
Salesforce
California - Palo Alto$102K - $253KView details - 7 days ago
Senior Software Engineer, Backend (AI Platform - AI Acceleration)
Coinbase
Remote$191KView details
Looking for something different?
Browse all AI jobs