CloudWalk
Company
Machine Learning Engineer (Distributed Training)
Remote
Job Description
Who we are:
CloudWalk is a fintech company reimagining the future of financial services. We are building intelligent infrastructure powered by AI, blockchain, and thoughtful design. Our products serve millions of entrepreneurs across Brazil and the US every day, helping them grow with tools that are fast, fair, and built for how business actually works. Learn more at cloudwalk.io.
Who We’re Looking For:
We’re looking for a Machine Learning Engineer to own and evolve our distributed training pipeline for large language models. You’ll work inside our GPU cluster to help researchers train and scale foundation models using frameworks like Hugging Face Transformers, Accelerate, DeepSpeed, FSDP, and others. Your focus will be distributed training: from designing sharding strategies and multi-node orchestration to optimizing throughput and managing checkpoints at scale.
This role is not research - it's about building and scaling the systems that let researchers move fast and models grow big. You’ll work closely with MLOps, infra, and model developers to make our training runs efficient, resilient, and reproducible.
What You'll Do:
What We’re Looking For:
Bonus Points:
How We Hire:
If you’ve trained LLMs before - or helped others do it better - this role is for you. Even if you don’t check every box, if you’re confident working with distributed compute and real-world LLM workloads, we want to hear from you.
CloudWalk
19 jobs posted
About the job
Similar Jobs
Discover more opportunities that match your interests
- 24 days ago
Machine Learning Engineer
Faculty
London - HybridView details - 5 days ago
Machine Learning Engineer
EarnIn
Bengaluru, IndiaView details - 23 days ago
Machine Learning Engineer - Training Performance
Wayve
SunnyvaleView details - 23 days ago
Machine Learning Engineer (Remote)
Output Biosciences
RemoteView details - 23 days ago
Machine Learning Engineer (NYC)
Output Biosciences
New York HQ 🗽View details - 14 days ago
Machine Learning Engineer (LLM)
BJAK
Hong KongView details - 14 days ago
Machine Learning Engineer (LLM)
BJAK
United KingdomView details - 14 days ago
机器学习工程师(大语言模型)Machine Learning Engineer (LLM)
BJAK
ChinaView details - 14 days ago
機械学習エンジニア(LLM)Machine Learning Engineer (LLM)
BJAK
Tokyo, JapanView details - 14 days ago
머신러닝 엔지니어 (LLM) Machine Learning Engineer (LLM)
BJAK
Seoul, KoreaView details
View all ML Engineer jobs
Looking for something different?
Browse all AI jobs