About the Team

The fleet team manages GPU infrastructure supporting both production and model development workloads. We manage one of the largest cutting edge GPU fleets in the world, exposing it as a singular platform for other OpenAI teams to seamlessly run production Applied AI and training workloads.

We seek to learn from deployment and distribute the benefits of AI, while ensuring that this powerful tool is used responsibly and safely. Safety is more important to us than unfettered growth.

About the Role

You'll drive execution and alignment across hardware, software, and operational teams to scale and maintain a highly performant compute platform. Your responsibilities will span planning, coordination, system design input, and operational excellence to ensure infrastructure readiness for evolving organizational needs. This team supports all the production training and inference compute at OpenAI.

This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.

In this role, you will:

Guide the roadmap for automation to support future growth of the GPU fleet.
Ensure that incoming clusters are tracked and delivered on-time while providing a stable supply signal for the OpenAI fleet.
Collaborate with internal teams to align on business metrics and influence infrastructure strategy.
Consistently partner with GPU users across research and applied-product infrastructure to drive high utilization and optimization opportunities.
Work with strategic partners (product engineering, inference, security, research, and finance) on product launches, big project rollouts, and build tooling Collaborate with XFN Partners that will allow us to build long-term, self-service tooling allowing OpenAI to seamlessly manage a growing compute fleet.
You might thrive in this role if you:
Possess a degree in a hard science, or have a demonstrated track record of engineering expertise.
Have 5+ years of experience in program management for major projects including capital projects or hyperscaler infrastructure deployment
Ability to dive into ambiguous technical problem spaces that may involve GPU and AI/ML Platform Infrastructure.
Demonstrated ability to serve as the go-to person solely responsible for driving and delivering complex projects.
Comfortable in managing cross-functional and cross-company teams; experience driving information and decision hygiene
Have an extensive track record of successfully delivering high-profile, technical projects against tight deadlines.
Are technically adept and have effectively partnered with engineering or fundamental research teams of the highest caliber.
Expertise in designing and implementing simple, scalable processes that solve complex problems.
Experience managing complicated dependencies such as logistics and or supply chains
Are relentlessly resourceful and thrive in ambiguous, fast-paced environments.
Are interested in and thoughtful about the impacts of AGI.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic.

For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement.

Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.

OpenAI Global Applicant Privacy Policy

At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

Upgrade Your Profile With Professional Headshots

Technical Program Manager, Fleet Management Systems

Related Jobs

Technical Program Manager

Technical Program Manager

Technical Program Manager

Technical Program Manager, Security

Lead Technical Program Manager

Share this job opportunity