Principle AI Researcher/Engineer - (LLM reinforcement learning)
Posted 98 days ago
Job Description
This job posting has expired and no longer accepting applications.
WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. Join AMD Silo AI’s Base Models team to work on open source post-training. You will turn strong base checkpoints into assistant-grade models through supervised fine-tuning, reward-modeling, and RL, while keeping multilingual and low-resource performance front-and-center. THE ROLE Design, implement and tune post-training methods (SFT, DPO/PPO/GRPO, RLVR) on large-scale HPC clusters. Develop high-throughput synthetic-data pipelines with verifiable results. Integrate relevant metrics with the Evaluation team to enable rapid feedback loops. Publish code, data sets and training recipes under permissive licenses; upstream improvements to TRL or other similar frameworks. Collaborate on OpenEuroLLM post-training efforts. Collaboration with others Pre-training team – align on checkpoint hand-offs, data mix insights and long-context plans. Evaluations & Benchmarking – collaborate on post-training metrics to ensure we are measuring what matters and targeting improvements where they are needed. Dev infra – set requirements for experiment tracking, logging and job orchestration OpenEuroLLM consortium – collaborate on post-training deliverables, share expertise and experimental results, and co-publish results. Main goals for first 6 months Familiarize yourself with existing post-training tooling, data and pain-points; help develop roadmap to improve post-training performance and broaden language support. Get involved with OpenEuroLLM post-training workgroup, help define project goals and implement solutions that scale to broad language support. Plan and execute improvements in our open source post-training pipeline. THE PERSON Candidate should have: Proven experience in post-training environment, especially with reward modeling and running RL at scale. Track record of engineering success or research contributions—publications, patents, open source releases, or academic/industrial collaborations. Solid foundation in statistics, optimization and error analysis. Willingness to take intellectual risks and tackle open-ended challenges head-on. Familiarity with industrial research workflows and modern software engineering practices. Documented communication skills and ability to mentor peers. We would like to see: Experience with multilingual post-training settings Prior work on reward models Working knowledge of more than one language Post-graduate degree in a relevant field (ML, NLP, RL). LOCATION: Remote from Finland #LI-MH3 #LI-HYBRID #LI-REMOTE Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.
Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.
Join AMD Silo AI’s Base Models team to work on open source post-training. You will turn strong base checkpoints into assistant-grade models through supervised fine-tuning, reward-modeling, and RL, while keeping multilingual and low-resource performance front-and-center. THE ROLE Design, implement and tune post-training methods (SFT, DPO/PPO/GRPO, RLVR) on large-scale HPC clusters. Develop high-throughput synthetic-data pipelines with verifiable results. Integrate relevant metrics with the Evaluation team to enable rapid feedback loops. Publish code, data sets and training recipes under permissive licenses; upstream improvements to TRL or other similar frameworks. Collaborate on OpenEuroLLM post-training efforts. Collaboration with others Pre-training team – align on checkpoint hand-offs, data mix insights and long-context plans. Evaluations & Benchmarking – collaborate on post-training metrics to ensure we are measuring what matters and targeting improvements where they are needed. Dev infra – set requirements for experiment tracking, logging and job orchestration OpenEuroLLM consortium – collaborate on post-training deliverables, share expertise and experimental results, and co-publish results. Main goals for first 6 months Familiarize yourself with existing post-training tooling, data and pain-points; help develop roadmap to improve post-training performance and broaden language support. Get involved with OpenEuroLLM post-training workgroup, help define project goals and implement solutions that scale to broad language support. Plan and execute improvements in our open source post-training pipeline. THE PERSON Candidate should have: Proven experience in post-training environment, especially with reward modeling and running RL at scale. Track record of engineering success or research contributions—publications, patents, open source releases, or academic/industrial collaborations. Solid foundation in statistics, optimization and error analysis. Willingness to take intellectual risks and tackle open-ended challenges head-on. Familiarity with industrial research workflows and modern software engineering practices. Documented communication skills and ability to mentor peers. We would like to see: Experience with multilingual post-training settings Prior work on reward models Working knowledge of more than one language Post-graduate degree in a relevant field (ML, NLP, RL). LOCATION: Remote from Finland #LI-MH3 #LI-HYBRID #LI-REMOTE
Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.
Join AMD Silo AI’s Base Models team to work on open source post-training. You will turn strong base checkpoints into assistant-grade models through supervised fine-tuning, reward-modeling, and RL, while keeping multilingual and low-resource performance front-and-center. THE ROLE Design, implement and tune post-training methods (SFT, DPO/PPO/GRPO, RLVR) on large-scale HPC clusters. Develop high-throughput synthetic-data pipelines with verifiable results. Integrate relevant metrics with the Evaluation team to enable rapid feedback loops. Publish code, data sets and training recipes under permissive licenses; upstream improvements to TRL or other similar frameworks. Collaborate on OpenEuroLLM post-training efforts. Collaboration with others Pre-training team – align on checkpoint hand-offs, data mix insights and long-context plans. Evaluations & Benchmarking – collaborate on post-training metrics to ensure we are measuring what matters and targeting improvements where they are needed. Dev infra – set requirements for experiment tracking, logging and job orchestration OpenEuroLLM consortium – collaborate on post-training deliverables, share expertise and experimental results, and co-publish results. Main goals for first 6 months Familiarize yourself with existing post-training tooling, data and pain-points; help develop roadmap to improve post-training performance and broaden language support. Get involved with OpenEuroLLM post-training workgroup, help define project goals and implement solutions that scale to broad language support. Plan and execute improvements in our open source post-training pipeline. THE PERSON Candidate should have: Proven experience in post-training environment, especially with reward modeling and running RL at scale. Track record of engineering success or research contributions—publications, patents, open source releases, or academic/industrial collaborations. Solid foundation in statistics, optimization and error analysis. Willingness to take intellectual risks and tackle open-ended challenges head-on. Familiarity with industrial research workflows and modern software engineering practices. Documented communication skills and ability to mentor peers. We would like to see: Experience with multilingual post-training settings Prior work on reward models Working knowledge of more than one language Post-graduate degree in a relevant field (ML, NLP, RL). LOCATION: Remote from Finland #LI-MH3 #LI-HYBRID #LI-REMOTE
This job posting has expired and no longer accepting applications. Please check out our latest AI jobs.
AMD
77 jobs posted
About the job
Posted on
Dec 8, 2025
Apply before
Jan 7, 2026
Job typeFull-time
CategoryOther AI jobs
Location
Helsinki, Finland
Skills
Similar Jobs
26d
Machine Learning Engineer, AI Evaluation
Wayve
LondonMachine Learning Engineer, AI Evaluation
Wayve
London26d12d
Research Engineer, Multimodal Reinforcement Learning
DeepMind
Zurich, SwitzerlandResearch Engineer, Multimodal Reinforcement Learning
DeepMind
Zurich, Switzerland12d12d
Principal Machine Learning Researcher (Physical AI)
Freeform
$200K - $400KLos Angeles, CAPrincipal Machine Learning Researcher (Physical AI)
Freeform
$200K - $400KLos Angeles, CA12d12d
Machine Learning Engineer - LLM Evals + Observability
Glean
$200K - $300KSan Francisco, CAMachine Learning Engineer - LLM Evals + Observability
Glean
$200K - $300KSan Francisco, CA12d27d
AI Deployment Engineer
OpenAI
Paris, FranceAI Deployment Engineer
OpenAI
Paris, France27d21d
Senior Machine Learning Engineer (Multimodal Generative AI)
Code and Theory
Bengaluru, Karnataka, IndiaSenior Machine Learning Engineer (Multimodal Generative AI)
Code and Theory
Bengaluru, Karnataka, India21d21d
Machine Learning Engineer
Faculty
LondonMachine Learning Engineer
Faculty
London21d20d
Machine Learning Engineer
Faculty
LondonMachine Learning Engineer
Faculty
London20d18d
Machine Learning Engineer, Reinforcement Learning
DoorDash
$137K - $202KSan Francisco, CASunnyvale, CASeattle WAMachine Learning Engineer, Reinforcement Learning
DoorDash
$137K - $202KSan Francisco, CASunnyvale, CASeattle WA18d12d
Machine Learning Engineer, AWS Applied AI Solution
Amazon
US, WAMachine Learning Engineer, AWS Applied AI Solution
Amazon
US, WA12d
Looking for something different?
Browse all AI jobsFree AI job alerts
Get the latest AI jobs delivered to your inbox every week. Free, no spam.