Figure logo

Figure

Company

Helix AI Intern, Speech [Winter/Summer 2026]

San Jose, CA

Job Description

Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA.

We are looking for an Helix AI Intern, Speech for Winter 2026 to contribute to the design and optimization of the real-time speech pipeline that powers natural voice interaction with our humanoid robot. This role offers hands-on experience at the intersection of audio systems, AI, and robotics—working on challenges such as low-latency audio streaming, speech enhancement, and real-time speech understanding.

This internship is designed for students in their final year of an undergraduate or master’s program, as well as recent graduates who are on track to complete their degree by the end of 2026, or the following year.

Responsibilities:

  • Support the development and testing of real-time audio and speech streaming pipelines
  • Contribute to the integration of low-latency, full-duplex audio systems using WebRTC or similar frameworks
  • Assist in evaluating or deploying AI-based components that improve speech quality, intelligibility, or responsiveness
  • Collaborate with AI, audio, and robotics engineers to enhance the reliability and performance of speech systems
  • Help build tools for monitoring, debugging, and visualizing live audio and speech pipeline performance

Requirements:

  • Undergraduate student (Senior) or recent graduate in Computer Science, Electrical Engineering, or a related field
  • Minimum 10 weeks internship, 1 to 2 terms preferred
  • Strong programming skills in Python or C++
  • Familiarity with real-time communication frameworks (WebRTC, gRPC, or WebSockets)
  • Understanding of digital audio fundamentals (sampling, latency, buffering, SNR, AEC)
  • Basic knowledge of machine learning concepts and experience deploying or using pre-trained models
  • Strong verbal and written communication skills

Bonus Qualifications:

  • Experience with audio ML frameworks (PyTorch, torchaudio, ONNX Runtime)
  • Familiarity with speech enhancement or ASR/TTS systems
  • Knowledge of asynchronous or multithreaded programming (asyncio, coroutines, or similar)
  • Exposure to cloud or edge-based audio processing systems
  • Interest in humanoid robots and real-time human–robot communication

The US hourly range for this internship position is between $40 - $50 per hour. 

The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.



Please mention that you found this job on MoAIJobs, this helps us grow. Thank you!

Figure logo

Figure

5 jobs posted

View all Figure jobs

About the job

Posted on

Oct 21, 2025

Apply before

Nov 20, 2025

Job typeFull-time

Share this job opportunity