AMD
Company
4 days ago
Senior AI/ML Infrastructure Engineer
Austin, Texas
Full-time
Job Description
WHAT YOU DO AT AMD CHANGES EVERYTHING We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives. AMD together we advance_ THE ROLE: AMD is looking for a specialized software engineer who is passionate about improving the performance of key applications and benchmarks. You will be a member of a core team of incredibly talented industry specialists and will work with the very latest hardware and software technology. THE PERSON: The ideal candidate should be passionate about software engineering and possess leadership skills to drive sophisticated issues to resolution. Able to communicate effectively and work optimally with different teams across AMD. KEY RESPONSIBILITIES: Architect and maintain robust, scalable infrastructure for training and deploying machine learning and large language models, ensuring optimal performance. Collaborate with AI researchers, data scientists, and software engineers to streamline the end-to-end AI model lifecycle, from development to deployment and monitoring. Design, develop, and fine-tune large-scale language models and other deep learning models for various applications. Implement and manage CI/CD pipelines for AI models, facilitating continuous integration, continuous deployment, and continuous training practices. Monitor the performance of machine learning and large language models, identifying and addressing issues related to data drift, model degradation, and resource constraints. Develop and enforce best practices for version control, testing, and deployment of AI models, ensuring compliance with industry standards and regulatory requirements. Optimize computing resources for training and inference processes, leveraging cloud technologies and onPrem solutions. Stay updated with the latest advancements in AI/ML technologies, tools, and practices, integrating them into our operations to enhance efficiency and effectiveness. Implement best practices in model training, including managing overfitting, underfitting, and ensuring model generalizability across various domains. Fine-tune models for specific tasks or industries using targeted techniques and adapt models to new domains or applications. Develop and maintain tools and frameworks to streamline the model training, validation, and deployment process. Document methodologies, processes, and findings; effectively communicate complex technical information to both technical and non-technical stakeholders. Mentor junior team members and contribute to the team's collective knowledge and expertise in deep learning and AI. PREFERRED EXPERIENCE: Software Development (Systems Engineering Focus): Proven experience in designing, developing, and maintaining robust software systems, with a deep understanding of performance, scalability, and reliability. ML Ops Expertise: Hands-on experience in deploying, monitoring, and managing machine learning models in production environments, including automation of pipelines and CI/CD practices. Strong proficiency in Python and familiarity with deep learning frameworks like TensorFlow, PyTorch, and Keras. Problem-Solving: Demonstrated ability to troubleshoot complex issues, resolve critical bottlenecks, and drive root cause analysis under time-sensitive conditions. Cloud & Infrastructure Knowledge: Familiarity with cloud platforms (AWS, Azure, GCP) and containerization/orchestration technologies (Docker, Kubernetes). Understanding of the ethical considerations and security implications of deploying AI models, particularly large language models. Collaboration & Communication: Strong cross-functional collaboration skills with the ability to clearly communicate technical concepts to both technical and non-technical stakeholders. Continuous Learning & Adaptability: Proven track record of quickly adapting to new technologies, tools, and methodologies in a fast-paced environment. ACADEMIC CREDENTIALS: Bachelor’s or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent #LI-JG1 Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.
Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.
THE ROLE: AMD is looking for a specialized software engineer who is passionate about improving the performance of key applications and benchmarks. You will be a member of a core team of incredibly talented industry specialists and will work with the very latest hardware and software technology. THE PERSON: The ideal candidate should be passionate about software engineering and possess leadership skills to drive sophisticated issues to resolution. Able to communicate effectively and work optimally with different teams across AMD. KEY RESPONSIBILITIES: Architect and maintain robust, scalable infrastructure for training and deploying machine learning and large language models, ensuring optimal performance. Collaborate with AI researchers, data scientists, and software engineers to streamline the end-to-end AI model lifecycle, from development to deployment and monitoring. Design, develop, and fine-tune large-scale language models and other deep learning models for various applications. Implement and manage CI/CD pipelines for AI models, facilitating continuous integration, continuous deployment, and continuous training practices. Monitor the performance of machine learning and large language models, identifying and addressing issues related to data drift, model degradation, and resource constraints. Develop and enforce best practices for version control, testing, and deployment of AI models, ensuring compliance with industry standards and regulatory requirements. Optimize computing resources for training and inference processes, leveraging cloud technologies and onPrem solutions. Stay updated with the latest advancements in AI/ML technologies, tools, and practices, integrating them into our operations to enhance efficiency and effectiveness. Implement best practices in model training, including managing overfitting, underfitting, and ensuring model generalizability across various domains. Fine-tune models for specific tasks or industries using targeted techniques and adapt models to new domains or applications. Develop and maintain tools and frameworks to streamline the model training, validation, and deployment process. Document methodologies, processes, and findings; effectively communicate complex technical information to both technical and non-technical stakeholders. Mentor junior team members and contribute to the team's collective knowledge and expertise in deep learning and AI. PREFERRED EXPERIENCE: Software Development (Systems Engineering Focus): Proven experience in designing, developing, and maintaining robust software systems, with a deep understanding of performance, scalability, and reliability. ML Ops Expertise: Hands-on experience in deploying, monitoring, and managing machine learning models in production environments, including automation of pipelines and CI/CD practices. Strong proficiency in Python and familiarity with deep learning frameworks like TensorFlow, PyTorch, and Keras. Problem-Solving: Demonstrated ability to troubleshoot complex issues, resolve critical bottlenecks, and drive root cause analysis under time-sensitive conditions. Cloud & Infrastructure Knowledge: Familiarity with cloud platforms (AWS, Azure, GCP) and containerization/orchestration technologies (Docker, Kubernetes). Understanding of the ethical considerations and security implications of deploying AI models, particularly large language models. Collaboration & Communication: Strong cross-functional collaboration skills with the ability to clearly communicate technical concepts to both technical and non-technical stakeholders. Continuous Learning & Adaptability: Proven track record of quickly adapting to new technologies, tools, and methodologies in a fast-paced environment. ACADEMIC CREDENTIALS: Bachelor’s or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent #LI-JG1
Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.
THE ROLE: AMD is looking for a specialized software engineer who is passionate about improving the performance of key applications and benchmarks. You will be a member of a core team of incredibly talented industry specialists and will work with the very latest hardware and software technology. THE PERSON: The ideal candidate should be passionate about software engineering and possess leadership skills to drive sophisticated issues to resolution. Able to communicate effectively and work optimally with different teams across AMD. KEY RESPONSIBILITIES: Architect and maintain robust, scalable infrastructure for training and deploying machine learning and large language models, ensuring optimal performance. Collaborate with AI researchers, data scientists, and software engineers to streamline the end-to-end AI model lifecycle, from development to deployment and monitoring. Design, develop, and fine-tune large-scale language models and other deep learning models for various applications. Implement and manage CI/CD pipelines for AI models, facilitating continuous integration, continuous deployment, and continuous training practices. Monitor the performance of machine learning and large language models, identifying and addressing issues related to data drift, model degradation, and resource constraints. Develop and enforce best practices for version control, testing, and deployment of AI models, ensuring compliance with industry standards and regulatory requirements. Optimize computing resources for training and inference processes, leveraging cloud technologies and onPrem solutions. Stay updated with the latest advancements in AI/ML technologies, tools, and practices, integrating them into our operations to enhance efficiency and effectiveness. Implement best practices in model training, including managing overfitting, underfitting, and ensuring model generalizability across various domains. Fine-tune models for specific tasks or industries using targeted techniques and adapt models to new domains or applications. Develop and maintain tools and frameworks to streamline the model training, validation, and deployment process. Document methodologies, processes, and findings; effectively communicate complex technical information to both technical and non-technical stakeholders. Mentor junior team members and contribute to the team's collective knowledge and expertise in deep learning and AI. PREFERRED EXPERIENCE: Software Development (Systems Engineering Focus): Proven experience in designing, developing, and maintaining robust software systems, with a deep understanding of performance, scalability, and reliability. ML Ops Expertise: Hands-on experience in deploying, monitoring, and managing machine learning models in production environments, including automation of pipelines and CI/CD practices. Strong proficiency in Python and familiarity with deep learning frameworks like TensorFlow, PyTorch, and Keras. Problem-Solving: Demonstrated ability to troubleshoot complex issues, resolve critical bottlenecks, and drive root cause analysis under time-sensitive conditions. Cloud & Infrastructure Knowledge: Familiarity with cloud platforms (AWS, Azure, GCP) and containerization/orchestration technologies (Docker, Kubernetes). Understanding of the ethical considerations and security implications of deploying AI models, particularly large language models. Collaboration & Communication: Strong cross-functional collaboration skills with the ability to clearly communicate technical concepts to both technical and non-technical stakeholders. Continuous Learning & Adaptability: Proven track record of quickly adapting to new technologies, tools, and methodologies in a fast-paced environment. ACADEMIC CREDENTIALS: Bachelor’s or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent #LI-JG1
Similar Jobs
Discover more opportunities that match your interests
14 days ago
Senior Software Engineer - ML Infrastructure
Plaid
View details
3 days ago
Senior Backend Engineer, AI Infrastructure
Unity
San Francisco, CA, USA
View details
27 days ago
AI/ML Engineer
Amazon
JO, Amman
View details
28 days ago
AI/ML Engineer
Hitachi
Chennai, Tamil Nadu, India
View details
17 days ago
Senior AI Engineer
Samsara
Remote
View details
17 days ago
AI Infrastructure Engineer
Distyl
New York
View details
18 days ago
Engineer, AI/ML
Samsung Research America
665 Clyde Avenue, Mountain View, CA, USA
View details
19 days ago
Senior AI Engineer
Mastercard
Toronto, Canada
View details
14 days ago
AI/ML Engineer
Hitachi
Chennai, Tamil Nadu, India
View details
14 days ago
Senior ML Engineer
Xero
View details
View all ML Engineer jobs
Looking for something different?
Browse all AI jobs