OpenAI launched voice capabilities in ChatGPT last September. Now, the company is previewing a model called Voice Engine, which can use a single 15-second audio clip and text prompt to generate longer audio. OpenAI boasts that Voice Engine produces life-like voices with inflection and tone, rather than a robotic drone.
According to OpenAI’s blog post, Voice Engine was first developed in 2022 to power ChatGPT’s Read Aloud feature as well as text-to-speech. Since then, OpenAI has tested Voice Engine in a number of different scenarios: for children and non-readers; for non-verbal people and people who have otherwise lost their voice; and translation. For each of these cases, OpenAI has partnered with companies in these respective spaces.
OpenAI spends a chunk of the blog post assuring that Voice Engine is built safely — and while it doesn’t mention the infamous robocall of Joe Biden explicitly, it’s implied: « We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year, » the post states. « We are engaging with U.S. and international partners from across government, media, entertainment, education, civil society and beyond to ensure we are incorporating their feedback as we build. »
The Biden robocall was likely made with software by ElevenLabs, not an OpenAI product, but Voice Engine might hold the same capabilities. At the end of the post, OpenAI states that due to its « approach to AI safety and [their] voluntary commitments » (committing to safety, security, and trust), it’s not releasing Voice Engine widely right now.
« We hope this preview of Voice Engine both underscores its potential and also motivates the need to bolster societal resilience against the challenges brought by ever more convincing generative models, » the post states. OpenAI then calls for a phase-out of voice-authentication as a security measure; policies to protect the use of people’s voices in AI; education; and more technology to identify inauthentic voices — all because of its own technology.
Read more and listen to examples of Voice Engine in OpenAI’s blog post, Navigating the Challenges and Opportunities of Synthetic Voices.