Helix AI Intern, Speech [Winter/Summer 2026]

San Jose, CA

Job Description

Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA.

We are looking for an Helix AI Intern, Speech for Winter 2026 to contribute to the design and optimization of the real-time speech pipeline that powers natural voice interaction with our humanoid robot. This role offers hands-on experience at the intersection of audio systems, AI, and robotics—working on challenges such as low-latency audio streaming, speech enhancement, and real-time speech understanding.

This internship is designed for students in their final year of an undergraduate or master’s program, as well as recent graduates who are on track to complete their degree by the end of 2026, or the following year.

Responsibilities:

Support the development and testing of real-time audio and speech streaming pipelines
Contribute to the integration of low-latency, full-duplex audio systems using WebRTC or similar frameworks
Assist in evaluating or deploying AI-based components that improve speech quality, intelligibility, or responsiveness
Collaborate with AI, audio, and robotics engineers to enhance the reliability and performance of speech systems
Help build tools for monitoring, debugging, and visualizing live audio and speech pipeline performance

Requirements:

Undergraduate student (Senior) or recent graduate in Computer Science, Electrical Engineering, or a related field
Minimum 10 weeks internship, 1 to 2 terms preferred
Strong programming skills in Python or C++
Familiarity with real-time communication frameworks (WebRTC, gRPC, or WebSockets)
Understanding of digital audio fundamentals (sampling, latency, buffering, SNR, AEC)
Basic knowledge of machine learning concepts and experience deploying or using pre-trained models
Strong verbal and written communication skills

Bonus Qualifications:

Experience with audio ML frameworks (PyTorch, torchaudio, ONNX Runtime)
Familiarity with speech enhancement or ASR/TTS systems
Knowledge of asynchronous or multithreaded programming (asyncio, coroutines, or similar)
Exposure to cloud or edge-based audio processing systems
Interest in humanoid robots and real-time human–robot communication

The US hourly range for this internship position is between $40 - $50 per hour.

The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.

Please mention that you found this job on MoAIJobs, this helps us grow. Thank you!