Senior Software Engineer - AI Triton Communication
Posted 30 days ago
Job Description
This job posting has expired and no longer accepting applications.
WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. Senior Software Engineer – AI Triton Communication THE ROLE: Triton is a widely adopted language and compiler for high-performance GPU kernels, powering major AI frameworks such as PyTorch, vLLM, and SGLang. As AI workloads increasingly scale across multiple GPUs and nodes, first-class support for distributed execution and communication in Triton is strategically critical to enabling efficient large-scale training and inference on AMD Instinct Accelerators. AMD GPUs are an official Triton backend, and delivering industry-leading distributed performance and scalability on AMD Instinct accelerators is a key priority. The performance, scalability, and usability of Triton directly impact the competitiveness of AMD hardware in large-scale AI deployments. In this role, you will advance the Triton compiler and runtime stack for AMD CDNA and next-generation GPU architectures by building native distributed execution and communication capabilities. You will develop compiler and runtime infrastructure that enables efficient inter-GPU communication, scalable execution, and optimal hardware utilization. You will work across compiler, runtime, and hardware layers, collaborating closely with GPU architecture and software teams to help establish AMD GPUs as a best-in-class platform for Triton-based distributed AI. THE PERSON: The ideal candidate has deep expertise in GPU architecture, compiler technologies, and distributed GPU systems, with proven experience optimizing workloads at multi-GPU scale. You are comfortable working across the full execution stack — from compiler and runtime to hardware — and understand how GPU execution, memory hierarchy, and inter-GPU communication impact performance. You have experience working close to the GPU runtime, communication stack, or compiler backend, and are motivated to build native distributed execution and communication capabilities tightly integrated with the compiler and runtime to maximize scalability and hardware utilization. You thrive on solving complex system-level performance challenges and delivering scalable, high-performance GPU infrastructure. KEY RESPONSIBILITIES: Design and develop native distributed communication and execution capabilities within the Triton AMDGPU backend, enabling scalable multi-GPU execution for large-scale AI workloads Design and implement Triton compiler and runtime mechanisms for native GPU-initiated communication, including collective operations, remote memory access, synchronization, and distributed execution primitives Drive performance optimization across compute and communication, including inter-GPU data movement, communication/computation overlap, memory hierarchy utilization, and GPU-driven scheduling efficiency Develop and optimize distributed Triton kernels and execution models to achieve high performance, scalability, and efficient hardware utilization for AI workloads Analyze, profile and debug complex cross-stack issues spanning Triton compiler, runtime, ROCm stack, and GPU hardware execution Collaborate closely with GPU architecture, compiler, runtime, and performance teams to co-design and enable next-generation distributed GPU programming and execution capabilities Contribute to open-source Triton and ROCm distributed ecosystem, driving innovation in distributed GPU computing PREFERRED EXPERIENCE: 5+ years of experience in compiler development, GPU software, distributed systems, or performance engineering Familiarity or hands-on experience with Triton compiler and runtime Deep understanding of modern GPU architectures, including execution model, memory hierarchy (LDS, L2, HBM), scheduling, occupancy, and hardware performance characteristics Good understanding of GPU runtime systems, communication stacks, and multi-GPU interconnects such as XGMI, NVLink, PCIe, or InfiniBand and their performance implications Familiarity with distributed GPU communication libraries such as RCCL, NCCL, NVSHMEM, rocSHMEM, or MPI and similar technologies Experience developing, optimizing, and scaling workloads across multiple GPUs, including inter-GPU communication, synchronization, and communication/computation overlap Strong experience with GPU programming using Triton, HIP, CUDA, or similar parallel programming environments Strong knowledge of MLIR and/or LLVM internals Experience profiling, debugging, and optimizing performance across compiler, runtime, and hardware layers Familiarity with ROCm, HIP, CUDA, or similar GPU programming ecosystems, including performance profiling and optimization tools Experience optimizing large-scale AI, machine learning or HPC workloads across multi-GPU systems Experience contributing to open-source projects and working in collaborative, cross-functional engineering environments Strong problem-solving, communication, and technical leadership skills ACADEMIC CREDENTIALS: Bachelor’s or Master's Degree in Computer Engineering, Computer Science, Electrical Engineering or equivalent practical experience This role is not eligible for visa sponsorship. #LI-G11 #LI-HYBRID Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.
Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.
Senior Software Engineer – AI Triton Communication THE ROLE: Triton is a widely adopted language and compiler for high-performance GPU kernels, powering major AI frameworks such as PyTorch, vLLM, and SGLang. As AI workloads increasingly scale across multiple GPUs and nodes, first-class support for distributed execution and communication in Triton is strategically critical to enabling efficient large-scale training and inference on AMD Instinct Accelerators. AMD GPUs are an official Triton backend, and delivering industry-leading distributed performance and scalability on AMD Instinct accelerators is a key priority. The performance, scalability, and usability of Triton directly impact the competitiveness of AMD hardware in large-scale AI deployments. In this role, you will advance the Triton compiler and runtime stack for AMD CDNA and next-generation GPU architectures by building native distributed execution and communication capabilities. You will develop compiler and runtime infrastructure that enables efficient inter-GPU communication, scalable execution, and optimal hardware utilization. You will work across compiler, runtime, and hardware layers, collaborating closely with GPU architecture and software teams to help establish AMD GPUs as a best-in-class platform for Triton-based distributed AI. THE PERSON: The ideal candidate has deep expertise in GPU architecture, compiler technologies, and distributed GPU systems, with proven experience optimizing workloads at multi-GPU scale. You are comfortable working across the full execution stack — from compiler and runtime to hardware — and understand how GPU execution, memory hierarchy, and inter-GPU communication impact performance. You have experience working close to the GPU runtime, communication stack, or compiler backend, and are motivated to build native distributed execution and communication capabilities tightly integrated with the compiler and runtime to maximize scalability and hardware utilization. You thrive on solving complex system-level performance challenges and delivering scalable, high-performance GPU infrastructure. KEY RESPONSIBILITIES: Design and develop native distributed communication and execution capabilities within the Triton AMDGPU backend, enabling scalable multi-GPU execution for large-scale AI workloads Design and implement Triton compiler and runtime mechanisms for native GPU-initiated communication, including collective operations, remote memory access, synchronization, and distributed execution primitives Drive performance optimization across compute and communication, including inter-GPU data movement, communication/computation overlap, memory hierarchy utilization, and GPU-driven scheduling efficiency Develop and optimize distributed Triton kernels and execution models to achieve high performance, scalability, and efficient hardware utilization for AI workloads Analyze, profile and debug complex cross-stack issues spanning Triton compiler, runtime, ROCm stack, and GPU hardware execution Collaborate closely with GPU architecture, compiler, runtime, and performance teams to co-design and enable next-generation distributed GPU programming and execution capabilities Contribute to open-source Triton and ROCm distributed ecosystem, driving innovation in distributed GPU computing PREFERRED EXPERIENCE: 5+ years of experience in compiler development, GPU software, distributed systems, or performance engineering Familiarity or hands-on experience with Triton compiler and runtime Deep understanding of modern GPU architectures, including execution model, memory hierarchy (LDS, L2, HBM), scheduling, occupancy, and hardware performance characteristics Good understanding of GPU runtime systems, communication stacks, and multi-GPU interconnects such as XGMI, NVLink, PCIe, or InfiniBand and their performance implications Familiarity with distributed GPU communication libraries such as RCCL, NCCL, NVSHMEM, rocSHMEM, or MPI and similar technologies Experience developing, optimizing, and scaling workloads across multiple GPUs, including inter-GPU communication, synchronization, and communication/computation overlap Strong experience with GPU programming using Triton, HIP, CUDA, or similar parallel programming environments Strong knowledge of MLIR and/or LLVM internals Experience profiling, debugging, and optimizing performance across compiler, runtime, and hardware layers Familiarity with ROCm, HIP, CUDA, or similar GPU programming ecosystems, including performance profiling and optimization tools Experience optimizing large-scale AI, machine learning or HPC workloads across multi-GPU systems Experience contributing to open-source projects and working in collaborative, cross-functional engineering environments Strong problem-solving, communication, and technical leadership skills ACADEMIC CREDENTIALS: Bachelor’s or Master's Degree in Computer Engineering, Computer Science, Electrical Engineering or equivalent practical experience This role is not eligible for visa sponsorship. #LI-G11 #LI-HYBRID
Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.
Senior Software Engineer – AI Triton Communication THE ROLE: Triton is a widely adopted language and compiler for high-performance GPU kernels, powering major AI frameworks such as PyTorch, vLLM, and SGLang. As AI workloads increasingly scale across multiple GPUs and nodes, first-class support for distributed execution and communication in Triton is strategically critical to enabling efficient large-scale training and inference on AMD Instinct Accelerators. AMD GPUs are an official Triton backend, and delivering industry-leading distributed performance and scalability on AMD Instinct accelerators is a key priority. The performance, scalability, and usability of Triton directly impact the competitiveness of AMD hardware in large-scale AI deployments. In this role, you will advance the Triton compiler and runtime stack for AMD CDNA and next-generation GPU architectures by building native distributed execution and communication capabilities. You will develop compiler and runtime infrastructure that enables efficient inter-GPU communication, scalable execution, and optimal hardware utilization. You will work across compiler, runtime, and hardware layers, collaborating closely with GPU architecture and software teams to help establish AMD GPUs as a best-in-class platform for Triton-based distributed AI. THE PERSON: The ideal candidate has deep expertise in GPU architecture, compiler technologies, and distributed GPU systems, with proven experience optimizing workloads at multi-GPU scale. You are comfortable working across the full execution stack — from compiler and runtime to hardware — and understand how GPU execution, memory hierarchy, and inter-GPU communication impact performance. You have experience working close to the GPU runtime, communication stack, or compiler backend, and are motivated to build native distributed execution and communication capabilities tightly integrated with the compiler and runtime to maximize scalability and hardware utilization. You thrive on solving complex system-level performance challenges and delivering scalable, high-performance GPU infrastructure. KEY RESPONSIBILITIES: Design and develop native distributed communication and execution capabilities within the Triton AMDGPU backend, enabling scalable multi-GPU execution for large-scale AI workloads Design and implement Triton compiler and runtime mechanisms for native GPU-initiated communication, including collective operations, remote memory access, synchronization, and distributed execution primitives Drive performance optimization across compute and communication, including inter-GPU data movement, communication/computation overlap, memory hierarchy utilization, and GPU-driven scheduling efficiency Develop and optimize distributed Triton kernels and execution models to achieve high performance, scalability, and efficient hardware utilization for AI workloads Analyze, profile and debug complex cross-stack issues spanning Triton compiler, runtime, ROCm stack, and GPU hardware execution Collaborate closely with GPU architecture, compiler, runtime, and performance teams to co-design and enable next-generation distributed GPU programming and execution capabilities Contribute to open-source Triton and ROCm distributed ecosystem, driving innovation in distributed GPU computing PREFERRED EXPERIENCE: 5+ years of experience in compiler development, GPU software, distributed systems, or performance engineering Familiarity or hands-on experience with Triton compiler and runtime Deep understanding of modern GPU architectures, including execution model, memory hierarchy (LDS, L2, HBM), scheduling, occupancy, and hardware performance characteristics Good understanding of GPU runtime systems, communication stacks, and multi-GPU interconnects such as XGMI, NVLink, PCIe, or InfiniBand and their performance implications Familiarity with distributed GPU communication libraries such as RCCL, NCCL, NVSHMEM, rocSHMEM, or MPI and similar technologies Experience developing, optimizing, and scaling workloads across multiple GPUs, including inter-GPU communication, synchronization, and communication/computation overlap Strong experience with GPU programming using Triton, HIP, CUDA, or similar parallel programming environments Strong knowledge of MLIR and/or LLVM internals Experience profiling, debugging, and optimizing performance across compiler, runtime, and hardware layers Familiarity with ROCm, HIP, CUDA, or similar GPU programming ecosystems, including performance profiling and optimization tools Experience optimizing large-scale AI, machine learning or HPC workloads across multi-GPU systems Experience contributing to open-source projects and working in collaborative, cross-functional engineering environments Strong problem-solving, communication, and technical leadership skills ACADEMIC CREDENTIALS: Bachelor’s or Master's Degree in Computer Engineering, Computer Science, Electrical Engineering or equivalent practical experience This role is not eligible for visa sponsorship. #LI-G11 #LI-HYBRID
This job posting has expired and no longer accepting applications. Please check out our latest AI jobs.
AMD
67 jobs posted
About the job
Posted on
Mar 13, 2026
Apply before
Apr 12, 2026
Job typeFull-time
CategoryOther AI jobs
Location
San Jose, CA
Similar Jobs
27d
Senior Software Engineer, AI Solutions
GoodLeap
San Francisco, CARoseville, CAAustin, TXIrvine, CASenior Software Engineer, AI Solutions
GoodLeap
San Francisco, CARoseville, CAAustin, TXIrvine, CA27d21d
Agentic AI - Senior Software Engineer
Mastercard
Dublin, IrelandAgentic AI - Senior Software Engineer
Mastercard
Dublin, Ireland21d18d
Senior Software Engineer, AI Products
Airbnb
$191K - $223KUnited StatesSenior Software Engineer, AI Products
Airbnb
$191K - $223KUnited States18d19d
AI Empowered Software Senior Engineer
Dell Technologies
Singapore, SingaporeAI Empowered Software Senior Engineer
Dell Technologies
Singapore, Singapore19d17d
Senior AI Software Development Engineer
AMD
IASI, RomaniaSenior AI Software Development Engineer
AMD
IASI, Romania17d11d
Senior Software Engineer - AI Platform
Datadog
Paris, FranceSophia Antipolis, FranceSenior Software Engineer - AI Platform
Datadog
Paris, FranceSophia Antipolis, France11d2d
Senior Software Engineer, AI Infrastructure
Robinhood
$196K - $230KMenlo Park, CASenior Software Engineer, AI Infrastructure
Robinhood
$196K - $230KMenlo Park, CA2d25d
Senior AI Engineer
Datadog
Paris, FranceSenior AI Engineer
Datadog
Paris, France25d18d
Senior AI Engineer
Workday
IsraelSenior AI Engineer
Workday
Israel18d13d
Senior AI Engineer
Mastercard
Gurgaon, IndiaSenior AI Engineer
Mastercard
Gurgaon, India13d
Looking for something different?
Browse all AI jobsFree AI job alerts
Get the latest AI jobs delivered to your inbox every week. Free, no spam.