Applied AI Researcher, Benchmarking

Remote

Job Description

Distyl AI develops AI native technologies for humans & AI to collaborate to power the operations of the Global Fortune 1000.

In just 24 months, we’ve rapidly grown to partner with some of the world’s largest enterprises—including F100 telecom, healthcare, manufacturing, insurance, and retail companies—delivering multiple AI deployments with $100M+ impact. Our platform, Distillery, along with our team of AI Engineers, Researchers, and Strategists, is pioneering AI-native systems of work, solving the most complex, high-stakes challenges at scale.

Distyl is founded and led by proven leaders from companies like Palantir, Apple, and top national laboratories. We work in deep partnership with OpenAI, jointly going-to-market at the largest enterprises and collaborating evaluating and testing the latest models. Backed by Lightspeed, Khosla, Coatue, industry leaders like Nat Friedman (former GitHub CEO), as well as board members of over 20+ F500s, Distyl is building the future of AI-powered enterprise operations.

What We Are Looking For

At Distyl we’re pushing the envelope of AI utilization in enterprise. This requires creative researchers who don’t just want to drive incremental improvements on benchmarks or optimize an existing process but instead are looking to creatively redefine how software is used.

Our researchers come from many academic backgrounds but have strong research track records, operate in an AI-native way, and would be bored staying on the rails of a traditional research org.

Key Responsibilities

The Benchmarking team defines how progress is measured. Researchers design evaluation frameworks that capture reasoning depth, interaction quality, reliability, and operational impact. They construct benchmarks that reflect real-world complexity. Their systems become the standard by which new architectures, techniques, and releases are judged.
Researchers in Benchmarking explore new paradigms for evaluating intelligent systems: adversarial robustness testing, longitudinal performance tracking, and human-in-the-loop assessment. They investigate how metrics shape model behavior and establish rigorous methodologies for quantifying emergent capability. Their insights drive both Distyl’s internal research priorities and industry-wide standards.

What We Require

Experience Designing and Running Evaluations: You’ve built or maintained benchmarks, test suites, or experimental frameworks to measure model or system performance.
Statistical and Analytical Rigor: You design fair, reproducible experiments and can extract signal from noisy empirical results.
Experience Building with Models, Not Just Building Models: We develop intelligent systems using models rather than training or fine-tuning them. Ideal candidates have expertise in compound AI systems, agentic collaboration, and associated techniques (ensembling, ReAct, graph-of-thoughts, etc.).
Proven Track Record of Research Results: Whether you’ve published in top journals, posted amazing work on twitter, or somewhere else we want to see what you've done.
Uses AI Every Day: Before you can revolutionize someone else’s workflow, you need to revolutionize yours. You should be using tools like ChatGPT, Cursor, and Perplexity to accelerate your workflow.
Strong Programming and Data Analysis Skills: While you might not consider yourself a software engineer you need to be able to build prototypes of your ideas and then perform the experiments to prove the effectiveness to a F500 Head of AI.
Biases Towards Showing vs Telling: Our customers want to see the power of AI today vs discuss the most elegant idea that will take 5 years to realize.

What We Offer

An opportunity to advance the cutting edge of LLM research and directly revolutionize work in the enterprise space.
Ownership of high-impact research projects, with the autonomy to explore novel approaches and solutions.
Access to state-of-the-art AI models, real business problems, and proprietary data sets across a diverse range of real-world industries.
Competitive salary and benefits package, including equity options, medical/dental/vision covered at 100% for you and your dependents, 401K plan, and perks such as commuter benefits and lunch provided in office.
Be part of a mission-oriented company to create practical adoption during the biggest revolution in human productivity.
A collaborative and intellectually stimulating environment that encourages innovation and personal growth.

If you are an innovative, ambitious, and driven individual looking to make a difference in the world of AI, we want to hear from you. Apply now to join our team as an Applied AI Researcher and help us shape the future of AI-driven solutions for enterprises across the globe.

Note: Distyl is a hybrid working environment and requires in office collaboration 3 days a week. We have offices in SF and NYC

Please mention that you found this job on MoAIJobs, this helps us grow. Thank you!