OpenAI's New Voice API: Real-Time Translation, Transcription

OpenAI Launches New Voice Intelligence Features in Its API

OpenAI just dropped a batch of new voice capabilities into its API, and they're more than just a chatbot that talks. The company announced three models that handle real-time conversation, translation, and transcription. The headline: GPT-Realtime-2, a voice model that uses GPT-5-class reasoning to handle complex requests. Also new: GPT-Realtime-Translate for live translation across 70+ input languages, and GPT-Realtime-Whisper for streaming speech-to-text.

GPT-Realtime-2: Smarter Voice Conversations

GPT-Realtime-2 is the successor to GPT-Realtime-1.5. The key upgrade is the underlying reasoning engine. OpenAI says it's built with GPT-5-class reasoning, which means it can handle more complicated requests than its predecessor. The model is designed to simulate realistic vocal interactions—think customer support agents that can actually think through a problem, not just read from a script.

Pricing: GPT-Realtime-2 is billed by token consumption, not by the minute. That's a shift from the other two models, which are metered by time. Expect to optimize your prompts to keep token counts down.

GPT-Realtime-Translate: Real-Time, Conversational Translation

GPT-Realtime-Translate does exactly what it sounds like: real-time translation that keeps pace with the speaker. It supports over 70 input languages (languages it can understand) and 13 output languages (languages it speaks back). This isn't a turn-based system—it's meant to be conversational, with the model responding in the target language as the user speaks.

Use cases: multilingual customer support, live events, international meetings. The model is billed by the minute, so long conversations will cost accordingly.

GPT-Realtime-Whisper: Live Transcription

GPT-Realtime-Whisper provides live speech-to-text as interactions happen. Unlike batch transcription, this streams text in real time, making it suitable for live captioning, note-taking during calls, or voice-controlled interfaces.

Also billed by the minute.

Guardrails and Abuse Prevention

OpenAI says it has built guardrails to prevent misuse—spam, fraud, or other abuse. The system includes triggers that halt conversations if they violate harmful content guidelines. This is critical for any production deployment, especially in customer-facing applications.

Who Should Care?

Customer service is the obvious first target. But OpenAI explicitly calls out education, media, events, and creator platforms. Think language learning apps that converse naturally, live event translation, or interactive voice experiences.

Developer Takeaways

All three models are available through the Realtime API. You can mix and match: use GPT-Realtime-2 for reasoning, drop in Translate for multilingual support, and Whisper for transcription. The token-based pricing for GPT-Realtime-2 means you'll want to manage conversation length carefully. For Translate and Whisper, it's all about runtime minutes.

What's Next?

If you're building voice interfaces, now is the time to experiment. The GPT-5 reasoning in a voice model opens up possibilities beyond simple Q&A. Try it in your dev environment, test latency, and watch your token burn.

All information sourced from TechCrunch article by Lucas Ropek.

OpenAI's New Voice API: Real-Time Translation, Transcription, and Reasoning

OpenAI Launches New Voice Intelligence Features in Its API

GPT-Realtime-2: Smarter Voice Conversations

GPT-Realtime-Translate: Real-Time, Conversational Translation

GPT-Realtime-Whisper: Live Transcription

Guardrails and Abuse Prevention

Who Should Care?

Developer Takeaways

What's Next?

Key Takeaways

Why It Matters

Get the weekly digest

You might also like

ds4.c: A Dedicated Metal Inference Engine for DeepSeek V4 Flash

Anthropic's NLAs Turn Claude's Internal Activations into Readable Text

ChatGPT's Chinese Verbal Tics: From 'I Will Catch You Steadily' to Mode Collapse

South Africa's Home Affairs Suspends Officials Over AI-Hallucinated Policy References