Member of Technical Staff (Research Engineer - LLM Systems & Performance)
Posted 98 days ago
Job Description
This job posting has expired and no longer accepting applications.
About Contextual AI
We're revolutionizing how AI Agents work by solving AI's most critical challenge: context. The right context at the right time unlocks the accuracy and production scale that enterprises leveraging AI require. Our enterprise AI development platform sits at the intersection of breakthrough AI research and practical developer needs. Our end-to-end platform allows AI developers to easily and accurately ingest and query documents from enterprise data sources and easily embed retrieval results into their business workflows.
Contextual AI was founded by the pioneers of Retrieval-Augmented Generation (RAG), the foundational technique behind the context layer, connecting foundation models to current and relevant information. Backed by the industry's most forward-thinking venture capitalists, we're not just participating in the enterprise AI revolution, we're defining it. Join us in building a future where AI doesn't just answer questions, it transforms businesses.
About the role
As a a Member of Technical Staff specializing in Research Engineer – LLM Systems & Performance, you will be part of a small, high-impact team building and optimizing LLM systems end-to-end, from Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) pipelines to high-throughput Inference clusters in production. You will collaborate closely with researchers and engineers to develop advanced models and infrastructure for the context layer.
What you'll do
- Implement and improve components of our SFT and RL training pipelines (e.g., Verl, SkyRL), including data loading, training loops, logging, and evaluation.
- Contribute to LLM inference infrastructure (e.g., vLLM, SGLang), including batching, KV-cache management, scheduling, and serving optimizations.
- Profile and optimize end-to-end performance (throughput, latency, compute/memory/bandwidth), using tools like Nsight and profilers to identify and fix bottlenecks.
- Work with distributed training and inference setups using NCCL, NVLink, and data/tensor/pipeline/expert/context parallelism on multi-GPU clusters.
- Help experiment with and productionize quantization (e.g., INT8, FP8, FP4, mixed-precision) for both training and inference.
- Write and optimize GPU kernels using tools like CUDA or Triton, and leverage techniques such as FlashAttention and Tensor Cores where appropriate.
- Collaborate with researchers to take ideas from paper → prototype → scaled experiments → production.
- Write clean, well-tested, and well-documented code that can be shared across multiple teams (Research, Platform and Products).
What we're seeking
- Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or a related technical field (or equivalent practical experience).
- Strong programming skills in Python.
- Experience with at least one major ML framework: PyTorch or JAX.
- Solid understanding of GPU computing fundamentals (threads/warps/blocks, memory hierarchy, bandwidth vs compute, etc.).
- Familiarity with distributed training or inference concepts (e.g., model parallelism, collective communication, disaggregated serving, KV caching).
- Interest in performance engineering: profiling, kernel fusion, memory layout, and end-to-end system efficiency.
- Ability to work in a fast-paced environment, communicate clearly, and collaborate closely with other engineers and researchers.
Equal Opportunity
Contextual AI is an equal opportunity employer and complies with all applicable federal, state, and local fair employment practices laws. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, sex, sexual orientation, gender, gender expression, gender identity, genetic information or characteristics, physical or mental disability, marital/domestic partner status, age, military/veteran status, medical condition, or any other characteristic protected by law.
This job posting has expired and no longer accepting applications. Please check out our latest AI jobs.
Contextual AI
0 jobs posted
About the job
Dec 8, 2025
Jan 7, 2026
Similar Jobs
27d
Research Engineer, FrontierAI
DeepMind
Paris, FranceResearch Engineer, FrontierAI
DeepMind
Paris, France27d13d
Research Engineer, FrontierAI
DeepMind
Paris, FranceResearch Engineer, FrontierAI
DeepMind
Paris, France13d16d
Research Engineer / Research Scientist, Tokens
Anthropic
$350K - $500KNew York City, NYSeattle, WASan Francisco, CAResearch Engineer / Research Scientist, Tokens
Anthropic
$350K - $500KNew York City, NYSeattle, WASan Francisco, CA16d11d
Research Engineer, Science of Scaling
Anthropic
£260K - £630KLondon, United KingdomResearch Engineer, Science of Scaling
Anthropic
£260K - £630KLondon, United Kingdom11d20d
Research Engineer, Strategic Initiatives, Multimedia
DeepMind
London, United KingdomResearch Engineer, Strategic Initiatives, Multimedia
DeepMind
London, United Kingdom20d12d
Research Engineer, Multimodal Reinforcement Learning
DeepMind
Zurich, SwitzerlandResearch Engineer, Multimodal Reinforcement Learning
DeepMind
Zurich, Switzerland12d25d
Systems Research Engineer- Software Engineer, Data and ML Infrastructure
Feedzai
PortugalSystems Research Engineer- Software Engineer, Data and ML Infrastructure
Feedzai
Portugal25d20d
Research Engineer, Multimodal Generative AI (Image/Video)
DeepMind
$166KKirklandSeattleResearch Engineer, Multimodal Generative AI (Image/Video)
DeepMind
$166KKirklandSeattle20d6d
Senior Research Engineer - Video Foundation Models (Pre - Training)
Synthesia
EuropeSenior Research Engineer - Video Foundation Models (Pre - Training)
Synthesia
Europe6d6d
Senior Research Engineer - Video Foundation Models (Pre - Training)
Synthesia
EuropeSenior Research Engineer - Video Foundation Models (Pre - Training)
Synthesia
Europe6d
Looking for something different?
Browse all AI jobsFree AI job alerts
Get the latest AI jobs delivered to your inbox every week. Free, no spam.