Cover Image for The Voice Behind GPT-4o: An Inside Look

The Voice Behind GPT-4o: An Inside Look

By Pierson Marks

Voice Mode has quickly become a favorite feature for many ChatGPT users, providing an engaging and interactive way to communicate with the AI. The voices behind this feature—Breeze, Cove, Ember, Juniper, and Sky—were chosen through a meticulous and collaborative process. Here's an exclusive look at how these voices were selected and the unique story behind Sky’s voice.

The Journey to Voice Selection

In September 2023, OpenAI introduced voice capabilities to ChatGPT, marking a significant milestone in enhancing user interactions. This feature was developed with inputs from professional voice actors, casting directors, and industry advisors. The selection process was extensive, taking five months and involving over 400 submissions from voice and screen actors.

Commitment to the Creative Community

OpenAI is dedicated to supporting the creative community. The company ensured that each voice actor was compensated above industry standards. This commitment extends to fostering respectful collaborations with voice actors and industry professionals to maintain the integrity and quality of the voices used in ChatGPT.

Setting the Criteria

In early 2023, OpenAI collaborated with award-winning casting directors and producers to define the criteria for selecting the voices. The criteria emphasized diversity, timelessness, trustworthiness, warmth, and a natural speaking style. The goal was to find voices that were not only pleasant to listen to but also inspired confidence and engagement.

The Audition Process

The call for talent was issued on May 10, 2023, and within a week, over 400 submissions were received. Actors were asked to record scripts of ChatGPT responses, which covered various topics from mindfulness to daily conversations. The casting team then reviewed these submissions and shortlisted 14 actors for further consideration.

Finalizing the Voices

After in-depth discussions about OpenAI’s vision for human-AI interactions, the final voices for Breeze, Cove, Ember, Juniper, and Sky were selected. Recording sessions were held in San Francisco during June and July 2023. The voices were then integrated into ChatGPT and launched on September 25, 2023.

The Story Behind Sky’s Voice

A notable part of this journey involved Sky’s voice. On May 20, 2024, CEO Sam Altman clarified that Sky’s voice was not that of Scarlett Johansson. The voice actor for Sky was chosen prior to any outreach to Ms. Johansson. Out of respect for her concerns, OpenAI decided to pause using Sky’s voice in their products. This decision underscores OpenAI's stance that AI voices should not imitate a celebrity’s distinctive voice.

As more details emerge from this ongoing legal saga, we will see what really happened.

Future of Voice Mode

With the launch of GPT-4o on May 13, 2024, OpenAI introduced an advanced Voice Mode for ChatGPT Plus users. This new mode offers more natural interactions, including better handling of interruptions, effective management of group conversations, background noise filtering, and adaptive tone. Additionally, OpenAI plans to introduce more voices to cater to diverse user preferences.

OpenAI’s process of selecting and integrating these voices reflects its dedication to quality and ethical AI development. As the technology evolves, users can look forward to more personalized and engaging interactions with ChatGPT.

Stay tuned for more updates on GPT-4o and the exciting new voice capabilities coming to ChatGPT.