OpenAI launches GPT-Realtime-2 voice model in API

POSTOP#1@OPENAI Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

POSTSA#1@SAMA people are really starting to use voice to interact with AI, especially when they have a lot of context to dump. GPT-Realtime-2 comes to the API today; it is a pretty big step forward. (we are working on improvements to voice in chat.)

POSTAD#520@THEREALADAMG https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/ “Advancing voice intelligence with new models in the API: A new generation of realtime voice models that can reason, translate, and transcribe as people speak.” https://x.com/TheRealAdamG/status/2052439196413940145/photo/1

QUOTEGB#9@GDB @OPENAIGPT-5-class reasoning for voice agents, now in the API:

GPT-5-class reasoning for voice agents, now in the API:

QuotingOpenAI@OPENAI

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

QUOTEGB#9@GDB @OPENAIYou can now just build amazing voice agents, with the GPT-Realtime-2 reasoning model in our API:

You can now just build amazing voice agents, with the GPT-Realtime-2 reasoning model in our API:

QuotingOpenAI@OPENAI

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

QUOTEOD#83@OPENAIDEVS @OPENAIVoice agents are getting more capable. Here’s what’s new: • GPT-Realtime-2 for voice agents that reason and take action • GPT-Realtime-Translate enabling translation from 70 input languages into 13 output languages • GPT-Realtime-Whisper, making transcription even faster

Voice agents are getting more capable. Here’s what’s new: • GPT-Realtime-2 for voice agents that reason and take action • GPT-Realtime-Translate enabling translation from 70 input languages into 13 output languages • GPT-Realtime-Whisper, making transcription even faster

QuotingOpenAI@OPENAI

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

QUOTEC🤗#115@CLEMENTDELANGUE @OPENAIwho’s adding this to reachy mini?

who’s adding this to reachy mini?

QuotingOpenAI@OPENAI

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

QUOTESW#206@SIMONW @OPENAISaw this and thought "yes! ChatGPT voice mode is going to stop acting like a two-year-model" but that upgrade hasn't shipped just yet

Saw this and thought "yes! ChatGPT voice mode is going to stop acting like a two-year-model" but that upgrade hasn't shipped just yet

QuotingOpenAI@OPENAI

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

QUOTESW#206@SIMONW Sounds like the ChatGPT upgrade is coming soon though https://twitter.com/openai/status/2052438197695877316

QUOTEWD#219@WILLDEPUE @OPENAIthis is a really big deal so please ignore the fact that openai decided to name it GPT-Realtime-2 oh my god bidirectional audio is the final step in making audio as an interface stick. talking while listening, jumping in, instant responses are incredible rip turn based audio

this is a really big deal so please ignore the fact that openai decided to name it GPT-Realtime-2 oh my god bidirectional audio is the final step in making audio as an interface stick. talking while listening, jumping in, instant responses are incredible rip turn based audio

QuotingOpenAI@OPENAI

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

QUOTEWD#219@WILLDEPUE @WILLDEPUEand, because i don’t work at openai anymore i can say we’re finally at the technological capability to build Her https://x.com/willdepue/status/2052479371194294532/photo/1

and, because i don’t work at openai anymore i can say we’re finally at the technological capability to build Her https://x.com/willdepue/status/2052479371194294532/photo/1

Quotingwill depue@WILLDEPUE

this is a really big deal so please ignore the fact that openai decided to name it GPT-Realtime-2 oh my god bidirectional audio is the final step in making audio as an interface stick. talking while listening, jumping in, instant responses are incredible rip turn based audio https://twitter.com/OpenAI/status/2052438194625593804

QUOTEWD#219@WILLDEPUE @OPENAIi think audio is honestly a bit like VR: everyone keeps getting excited about it but it doesn’t fully stick as an interface tool use in realtime, reasoning while speaking, live translations are massive steps to getting audio interfaces to take off

i think audio is honestly a bit like VR: everyone keeps getting excited about it but it doesn’t fully stick as an interface tool use in realtime, reasoning while speaking, live translations are massive steps to getting audio interfaces to take off

QuotingOpenAI@OPENAI

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

QUOTEBP#259@BORISMPOWER @SAMAVoice is a lot more natural for humans and over time AI will shape urself to make the best use of our limited bandwidth to produce the highest value to us

Voice is a lot more natural for humans and over time AI will shape urself to make the best use of our limited bandwidth to produce the highest value to us

QuotingSam Altman@SAMA

people are really starting to use voice to interact with AI, especially when they have a lot of context to dump. GPT-Realtime-2 comes to the API today; it is a pretty big step forward. (we are working on improvements to voice in chat.)

QUOTEBP#259@BORISMPOWER Benchmarks get saturated so quickly ! https://twitter.com/artificialanlys/status/2052486470469140777

QUOTETM#339@ISCIENCELUVR @SAMAyes it will change... in the future people will prefer to interact with AI via brain-computer interfaces

QUOTEER#371@ERICMITCHELLAI @OPENAISo impressed by this model... What will you build with this? What will we build with this?

So impressed by this model... What will you build with this? What will we build with this?

QuotingOpenAI@OPENAI

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

QUOTEGA#396@GABRIEL1 @OPENAIif 5.5 becomes 20x faster, you'll talk and code live while the interface is changing as you speak

if 5.5 becomes 20x faster, you'll talk and code live while the interface is changing as you speak

QuotingOpenAI@OPENAI

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

QUOTEGA#396@GABRIEL1 @GABRIEL1ultra fast voice & language models will be even more important for everyday work use your voice to work, and immediately see the result live on your screen. go between apps, navigate complex interfaces, create custom interfaces to preview, all instant

ultra fast voice & language models will be even more important for everyday work use your voice to work, and immediately see the result live on your screen. go between apps, navigate complex interfaces, create custom interfaces to preview, all instant

Quotinggabriel@GABRIEL1

if 5.5 becomes 20x faster, you'll talk and code live while the interface is changing as you speak https://twitter.com/openai/status/2052438194625593804

QUOTESW#464@SHERWINWU Our very first speech model that uses reasoning is now live! The coolest part about this model is how it knows to speak a short preamble (i.e. "hmm.. let me think about that") as its reasoning tokens are going in the background. Kind of like what people do! https://twitter.com/OpenAI/status/2052438196454379986

QUOTESW#464@SHERWINWU @OPENAIThe new reasoning audio model is getting a lot of attention, but GPT-Realtime-Translate might be even cooler. It allows for realtime translation of any audio into countless language – a huge ask from businesses! Works well for livestreams, sermons, radio shows, etc...

The new reasoning audio model is getting a lot of attention, but GPT-Realtime-Translate might be even cooler. It allows for realtime translation of any audio into countless language – a huge ask from businesses! Works well for livestreams, sermons, radio shows, etc...

QuotingOpenAI@OPENAI

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

QUOTERH#514@ROMAINHUET @OPENAIBig day for developers: new realtime audio models are here in the OpenAI API! 🗣️ Fun one to demo: live translation with GPT-Realtime-Translate, and GPT-Realtime-2, our first speech-to-speech reasoning model for voice agents. Voice is becoming an interface you can actually ship.

Big day for developers: new realtime audio models are here in the OpenAI API! 🗣️ Fun one to demo: live translation with GPT-Realtime-Translate, and GPT-Realtime-2, our first speech-to-speech reasoning model for voice agents. Voice is becoming an interface you can actually ship.

QuotingOpenAI@OPENAI

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

QUOTEAD#520@THEREALADAMG @THSOTTIAUXbig if true

QUOTETI#581@THSOTTIAUX @OPENAIWe are assembling AGI in plain sight

We are assembling AGI in plain sight

QuotingOpenAI@OPENAI

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

QUOTEDU#787@DERYATR_@OPENAIGPT-Realtime-2 is really real-time voice now! This is via API only for now, but I am sure ChatGPT integration of voice with GPT-5-class reasoning is coming soon. It was long overdue!

GPT-Realtime-2 is really real-time voice now! This is via API only for now, but I am sure ChatGPT integration of voice with GPT-5-class reasoning is coming soon. It was long overdue!

QuotingOpenAI@OPENAI

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

QUOTEAL#824@ALTH0U @SAMAbro thinks its about voice 🫵

REPLYSA#1@SAMA @SAMAas a side note, young people seem to prefer to interact with AI via voice, and old people, and people in the middle like to type. i wonder if this will change.

as a side note, young people seem to prefer to interact with AI via voice, and old people, and people in the middle like to type. i wonder if this will change.

Replying toSam Altman@SAMA

people are really starting to use voice to interact with AI, especially when they have a lot of context to dump. GPT-Realtime-2 comes to the API today; it is a pretty big step forward. (we are working on improvements to voice in chat.)

REPLYOD#83@OPENAIDEVS @OPENAIDEVSGPT-Realtime-2 is built for voice agents that need to keep the conversation going while they work. The model is better at harder requests, tool use, recovery behavior, domain-specific language, and tone control while the conversation is happening. We also increased its context window from 32K to 128K, supporting longer conversations and more complex task flows.

GPT-Realtime-2 is built for voice agents that need to keep the conversation going while they work. The model is better at harder requests, tool use, recovery behavior, domain-specific language, and tone control while the conversation is happening. We also increased its context window from 32K to 128K, supporting longer conversations and more complex task flows.

Replying toOpenAI Developers@OPENAIDEVS

Voice agents are getting more capable. Here’s what’s new: • GPT-Realtime-2 for voice agents that reason and take action • GPT-Realtime-Translate enabling translation from 70 input languages into 13 output languages • GPT-Realtime-Whisper, making transcription even faster https://twitter.com/4398626122/status/2052438194625593804

REPLYOD#83@OPENAIDEVS @OPENAIDEVSGPT-Realtime-Translate lets you translate speech as it’s spoken. It supports 70+ input languages and 13 output languages, built for live multilingual experiences where people can talk naturally without waiting for a turn-by-turn translation flow.

GPT-Realtime-Translate lets you translate speech as it’s spoken. It supports 70+ input languages and 13 output languages, built for live multilingual experiences where people can talk naturally without waiting for a turn-by-turn translation flow.

Replying toOpenAI Developers@OPENAIDEVS

GPT-Realtime-2 is built for voice agents that need to keep the conversation going while they work. The model is better at harder requests, tool use, recovery behavior, domain-specific language, and tone control while the conversation is happening. We also increased its context window from 32K to 128K, supporting longer conversations and more complex task flows.

REPLYOD#83@OPENAIDEVS @OPENAIDEVSGPT-Realtime-Whisper brings low-latency streaming transcription to the Realtime API. Use it when your app needs to understand speech continuously while the interaction is still unfolding.

GPT-Realtime-Whisper brings low-latency streaming transcription to the Realtime API. Use it when your app needs to understand speech continuously while the interaction is still unfolding.

Replying toOpenAI Developers@OPENAIDEVS

GPT-Realtime-Translate lets you translate speech as it’s spoken. It supports 70+ input languages and 13 output languages, built for live multilingual experiences where people can talk naturally without waiting for a turn-by-turn translation flow.

REPLYOD#83@OPENAIDEVS @OPENAIDEVSGPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper are available in the Realtime API today. https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/

GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper are available in the Realtime API today. https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/

Replying toOpenAI Developers@OPENAIDEVS

GPT-Realtime-Whisper brings low-latency streaming transcription to the Realtime API. Use it when your app needs to understand speech continuously while the interaction is still unfolding.

REPLYWD#219@WILLDEPUE @WILLDEPUEwe already have great PMF with audio transcription in Codex and ChatGPT, which i use all the time its clear audio in, text/actions out is the right interface, with some limited voice responses when needed. but it needs to be always awaiting my command!

we already have great PMF with audio transcription in Codex and ChatGPT, which i use all the time its clear audio in, text/actions out is the right interface, with some limited voice responses when needed. but it needs to be always awaiting my command!

Replying towill depue@WILLDEPUE

i think audio is honestly a bit like VR: everyone keeps getting excited about it but it doesn’t fully stick as an interface tool use in realtime, reasoning while speaking, live translations are massive steps to getting audio interfaces to take off https://twitter.com/OpenAI/status/2052438194625593804

REPLYWD#219@WILLDEPUE @WILLDEPUEwe‘re headed towards Jarvis that can do everything on your computer, listens all the time for your command, with complete fusion of speaking back to you, typing a response, or silently taking action... maybe even video?

we‘re headed towards Jarvis that can do everything on your computer, listens all the time for your command, with complete fusion of speaking back to you, typing a response, or silently taking action... maybe even video?

Replying towill depue@WILLDEPUE

we already have great PMF with audio transcription in Codex and ChatGPT, which i use all the time its clear audio in, text/actions out is the right interface, with some limited voice responses when needed. but it needs to be always awaiting my command!

REPLYRH#514@ROMAINHUET @ROMAINHUETIn this video, @dkundel interrupts me in German, and GPT-Realtime-Translate just figures it out. And here I was, ready to dust off my German! Then @jxnlco jumps in mid-demo to talk about preambles. GPT-Realtime-2 is listening, but stays quiet until I say “back to demo.”

In this video, @dkundel interrupts me in German, and GPT-Realtime-Translate just figures it out. And here I was, ready to dust off my German! Then @jxnlco jumps in mid-demo to talk about preambles. GPT-Realtime-2 is listening, but stays quiet until I say “back to demo.”

Replying toRomain Huet@ROMAINHUET

Big day for developers: new realtime audio models are here in the OpenAI API! 🗣️ Fun one to demo: live translation with GPT-Realtime-Translate, and GPT-Realtime-2, our first speech-to-speech reasoning model for voice agents. Voice is becoming an interface you can actually ship. https://twitter.com/OpenAI/status/2052438194625593804

REPLYRH#514@ROMAINHUET @ROMAINHUET@dkundel @jxnlco That’s what feels so magical with this new set of realtime models. Agents can translate live, keep conversations going while thinking in the background, preserve context, and even take action. To get started, ask Codex to add these models to your app! https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/

@dkundel @jxnlco That’s what feels so magical with this new set of realtime models. Agents can translate live, keep conversations going while thinking in the background, preserve context, and even take action. To get started, ask Codex to add these models to your app! https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/

Replying toRomain Huet@ROMAINHUET

In this video, @dkundel interrupts me in German, and GPT-Realtime-Translate just figures it out. And here I was, ready to dust off my German! Then @jxnlco jumps in mid-demo to talk about preambles. GPT-Realtime-2 is listening, but stays quiet until I say “back to demo.”

REPLYDF#632@DANIELLEFONG @WILLDEPUE@willdepue it's time

REPOSTAD#520@THEREALADAMG @OPENAIDEVSVoice agents are getting more capable. Here’s what’s new: • GPT-Realtime-2 for voice agents that reason and take action • GPT-Realtime-Translate enabling translation from 70 input languages into 13 output languages • GPT-Realtime-Whisper, making transcription even faster https://twitter.com/4398626122/status/2052438194625593804

REPOSTAD#520@THEREALADAMG @OPENAIDEVSGPT-Realtime-2 is built for voice agents that need to keep the conversation going while they work. The model is better at harder requests, tool use, recovery behavior, domain-specific language, and tone control while the conversation is happening. We also increased its context window from 32K to 128K, supporting longer conversations and more complex task flows.

REPOSTAD#520@THEREALADAMG @OPENAIDEVSGPT-Realtime-Translate lets you translate speech as it’s spoken. It supports 70+ input languages and 13 output languages, built for live multilingual experiences where people can talk naturally without waiting for a turn-by-turn translation flow.

REPOSTAD#520@THEREALADAMG @OPENAIDEVSGPT-Realtime-Whisper brings low-latency streaming transcription to the Realtime API. Use it when your app needs to understand speech continuously while the interaction is still unfolding.

REPOSTAD#520@THEREALADAMG @OPENAIDEVSGPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper are available in the Realtime API today. https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/

REPOSTAD#520@THEREALADAMG @SCALEAILABSCongrats to @OpenAI for taking the top spot on our Audio MultiChallenge S2S leaderboard with the release of GPT‑Realtime‑2 🥇 GPT-Realtime-2 more than doubles GPT-Realtime-1.5 on instruction retention, rising from 36.7% to 70.8% APR, and also stands out on voice editing, especially when users repair or revise what they are saying in real time – crucial for voice agent use cases. Excited to see the pace of progress as voice AI accelerates.

Congrats to @OpenAI for taking the top spot on our Audio MultiChallenge S2S leaderboard with the release of GPT‑Realtime‑2 🥇 GPT-Realtime-2 more than doubles GPT-Realtime-1.5 on instruction retention, rising from 36.7% to 70.8% APR, and also stands out on voice editing, especially when users repair or revise what they are saying in real time – crucial for voice agent use cases. Excited to see the pace of progress as voice AI accelerates.

REPOSTAD#520@THEREALADAMG @JUBERTIBig Realtime API drop! - gpt-realtime-2, our first realtime model with reasoning - gpt-realtime-translate for voice-to-voice translation - gpt-realtime-whisper for streaming transcription Docs: https://developers.openai.com/api/docs/guides/realtime https://twitter.com/OpenAIDevs/status/2052440907933474954

REPOSTDU#787@DERYATR_@DERYATR_GPT-Realtime-2 is really real-time voice now! This is via API only for now, but I am sure ChatGPT integration of voice with GPT-5-class reasoning is coming soon. It was long overdue! https://twitter.com/openai/status/2052438194625593804

REPOSTAC#851@ANDREWCURRAN_@OPENAIIntroducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

AI 1000 · 44 actions

Sentiment