AMD
Company
CVP FDE – AI Software Development
San Jose, California
Job Description
WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. This role is not eligible for visa sponsorship. THE ROLE: Build and scale a world-class FDE organization, strategically combining ML Generalists, Low-Level Kernel Optimizers, and Solutions Architects to cover the full customer deployment lifecycle. This is a highly visible role with large scope and impact. THE PERSON: Define and institutionalize the FDE Engagement Model to maximize resource leverage and ensure consistent, high-velocity customer outcomes. Serve as the Voice of the Customer internally: Translate field intelligence and customer challenges into concrete, prioritized engineering roadmaps, and ensure execution. KEY RESPONSIBILITIES: Cluster Bring-up & Optimization: Oversee the technical onboarding of massive GPU clusters. Ensure your team can troubleshoot collective communication errors, debug framework issues, and optimize training/inference strategies. Utilization Engineering (The North Star Metric): Drive and maintain industry-leading Customer GPU Utilization across clusters of thousands of GPUs, making cluster satisfaction the key measure of success. High-Performance Model Deployment: Enable customer success by deeply optimizing open-sourced models (Llama 3, DeepSeek, Mixtral) and proprietary models for our specific hardware topology, utilizing tools like vLLM and TensorRT-LLM. Executive Technical Sponsorship: Act as the technical authority and executive sponsor on large deals, possessing the credibility to validate architecture with CTOs and VPs of AI. Feedback Loop: Aggressively channel field intelligence back to Product Engineering. If customers are struggling with a specific use case, then become the loudest voice in the room demanding a fix. PREFERRED EXPERIENCE: Years of Technical Leadership: Demonstrated track record leading high-impact technical teams within high-stakes environments (e.g., Cloud Infrastructure, AI Platform, or HPC). The "Hardware/Software" Hybrid: You understand the stack from the metal up. Commercial Acumen & Fluency: Deep understanding of commercial drivers (ARR, Churn, Margin) and the ability to articulate how technical solutions impact deal velocity and business outcomes. High-Stakes Crisis Management: Experience leading through "Sev0" customer incidents (e.g., massive training run failures), demonstrating the poise and clarity required to manage executive communication while guiding rapid root cause resolution. Technical Competency AI Frameworks: PyTorch, JAX, TensorFlow. Distributed Computing: Slurm, Ray, Kubernetes (K8s), Docker. GPU Ecosystem: NVIDIA drivers, CUDA profiling (Nsight Systems), Triton Inference Server. LLM Operations (Differentiator): Significant experience with advanced LLM deployment and customization techniques, including fine-tuning (e.g., LoRA/QLoRA) and building RAG pipelines. ACADEMIC CREDENTIALS: BS, MS or equivalent with direct experience #LI-MH2 Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.
Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.
This role is not eligible for visa sponsorship. THE ROLE: Build and scale a world-class FDE organization, strategically combining ML Generalists, Low-Level Kernel Optimizers, and Solutions Architects to cover the full customer deployment lifecycle. This is a highly visible role with large scope and impact. THE PERSON: Define and institutionalize the FDE Engagement Model to maximize resource leverage and ensure consistent, high-velocity customer outcomes. Serve as the Voice of the Customer internally: Translate field intelligence and customer challenges into concrete, prioritized engineering roadmaps, and ensure execution. KEY RESPONSIBILITIES: Cluster Bring-up & Optimization: Oversee the technical onboarding of massive GPU clusters. Ensure your team can troubleshoot collective communication errors, debug framework issues, and optimize training/inference strategies. Utilization Engineering (The North Star Metric): Drive and maintain industry-leading Customer GPU Utilization across clusters of thousands of GPUs, making cluster satisfaction the key measure of success. High-Performance Model Deployment: Enable customer success by deeply optimizing open-sourced models (Llama 3, DeepSeek, Mixtral) and proprietary models for our specific hardware topology, utilizing tools like vLLM and TensorRT-LLM. Executive Technical Sponsorship: Act as the technical authority and executive sponsor on large deals, possessing the credibility to validate architecture with CTOs and VPs of AI. Feedback Loop: Aggressively channel field intelligence back to Product Engineering. If customers are struggling with a specific use case, then become the loudest voice in the room demanding a fix. PREFERRED EXPERIENCE: Years of Technical Leadership: Demonstrated track record leading high-impact technical teams within high-stakes environments (e.g., Cloud Infrastructure, AI Platform, or HPC). The "Hardware/Software" Hybrid: You understand the stack from the metal up. Commercial Acumen & Fluency: Deep understanding of commercial drivers (ARR, Churn, Margin) and the ability to articulate how technical solutions impact deal velocity and business outcomes. High-Stakes Crisis Management: Experience leading through "Sev0" customer incidents (e.g., massive training run failures), demonstrating the poise and clarity required to manage executive communication while guiding rapid root cause resolution. Technical Competency AI Frameworks: PyTorch, JAX, TensorFlow. Distributed Computing: Slurm, Ray, Kubernetes (K8s), Docker. GPU Ecosystem: NVIDIA drivers, CUDA profiling (Nsight Systems), Triton Inference Server. LLM Operations (Differentiator): Significant experience with advanced LLM deployment and customization techniques, including fine-tuning (e.g., LoRA/QLoRA) and building RAG pipelines. ACADEMIC CREDENTIALS: BS, MS or equivalent with direct experience #LI-MH2
Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.
This role is not eligible for visa sponsorship. THE ROLE: Build and scale a world-class FDE organization, strategically combining ML Generalists, Low-Level Kernel Optimizers, and Solutions Architects to cover the full customer deployment lifecycle. This is a highly visible role with large scope and impact. THE PERSON: Define and institutionalize the FDE Engagement Model to maximize resource leverage and ensure consistent, high-velocity customer outcomes. Serve as the Voice of the Customer internally: Translate field intelligence and customer challenges into concrete, prioritized engineering roadmaps, and ensure execution. KEY RESPONSIBILITIES: Cluster Bring-up & Optimization: Oversee the technical onboarding of massive GPU clusters. Ensure your team can troubleshoot collective communication errors, debug framework issues, and optimize training/inference strategies. Utilization Engineering (The North Star Metric): Drive and maintain industry-leading Customer GPU Utilization across clusters of thousands of GPUs, making cluster satisfaction the key measure of success. High-Performance Model Deployment: Enable customer success by deeply optimizing open-sourced models (Llama 3, DeepSeek, Mixtral) and proprietary models for our specific hardware topology, utilizing tools like vLLM and TensorRT-LLM. Executive Technical Sponsorship: Act as the technical authority and executive sponsor on large deals, possessing the credibility to validate architecture with CTOs and VPs of AI. Feedback Loop: Aggressively channel field intelligence back to Product Engineering. If customers are struggling with a specific use case, then become the loudest voice in the room demanding a fix. PREFERRED EXPERIENCE: Years of Technical Leadership: Demonstrated track record leading high-impact technical teams within high-stakes environments (e.g., Cloud Infrastructure, AI Platform, or HPC). The "Hardware/Software" Hybrid: You understand the stack from the metal up. Commercial Acumen & Fluency: Deep understanding of commercial drivers (ARR, Churn, Margin) and the ability to articulate how technical solutions impact deal velocity and business outcomes. High-Stakes Crisis Management: Experience leading through "Sev0" customer incidents (e.g., massive training run failures), demonstrating the poise and clarity required to manage executive communication while guiding rapid root cause resolution. Technical Competency AI Frameworks: PyTorch, JAX, TensorFlow. Distributed Computing: Slurm, Ray, Kubernetes (K8s), Docker. GPU Ecosystem: NVIDIA drivers, CUDA profiling (Nsight Systems), Triton Inference Server. LLM Operations (Differentiator): Significant experience with advanced LLM deployment and customization techniques, including fine-tuning (e.g., LoRA/QLoRA) and building RAG pipelines. ACADEMIC CREDENTIALS: BS, MS or equivalent with direct experience #LI-MH2
AMD
112 jobs posted
About the job
Similar Jobs
Discover more opportunities that match your interests
- 28 days ago
Software Development Engineer, Alexa AI
Amazon
PL, GdanskView details - 5 days ago
Sr AI Software Development Engineer
AMD
Santa Clara, CaliforniaView details - 5 days ago
Sr AI Software Development Engineer
AMD
MARKHAM, CanadaView details - 23 days ago
Software Development Manager, AWS Agentic AI
Amazon
IL, HaifaView details - 23 days ago
Software Development Engineer, AWS Agentic AI
Amazon
IL, HaifaView details
29 days agoSoftware Development Engineer Test (AI Testing)
EarnIn
Mexico City, MexicoView details- 14 days ago
AI Software Engineer
AMD
Shanghai, ChinaView details - 9 days ago
Senior Software Development Engineer, AWS Agentic AI
Amazon
IL, HaifaView details - 5 days ago
Software Development Engineer-Infrastructure, Frontier AI Robotics
Amazon
US, CA, San FranciscoView details - 4 days ago
Software Development Engineer, AI Business Automation, Prime Video
Amazon
IN, KA, BengaluruView details
Looking for something different?
Browse all AI jobs