OpenAI just raised the stakes for voice AI.
The company unveiled gpt-realtime, its most advanced speech-to-speech model yet, alongside a fully released Realtime API — a one-two punch that could redefine customer support automation while putting dozens of smaller startups on notice.
Key Takeaways
- OpenAI launches gpt-realtime, its most advanced speech-to-speech model.
- Realtime API exits beta with multimodal support and new enterprise tools.
- Startups relying on Twilio integrations may lose their competitive edge.
- T-Mobile is already testing the tech for customer support improvements.
- Pricing concerns and limited observability spark debate among CEOs.
OpenAI launched gpt-realtime and made its Realtime API generally available, enabling production-ready voice AI agents for customer support and beyond. The move threatens smaller startups that rely on Twilio-like phone integrations, while early adopters such as T-Mobile report more natural, human-like customer conversations with the model.
OpenAI Pushes Into Voice AI — And Shakes the Market
OpenAI has rolled out gpt-realtime, a powerful speech-to-speech model, and fully launched its Realtime API. Together, they promise to make it easier than ever for businesses to build production-ready voice agents, particularly for customer support.
The announcement puts pressure on smaller startups that have thrived by bridging phone systems with existing AI services. Now, OpenAI is offering enterprises a direct path — and cutting out the middle layer.
Why Startups Are on Edge
Some conversational AI startups rely heavily on Twilio-style integrations to connect speech AI with public phone networks. Andreas Granig, CEO at Sipfront, warned these companies may struggle to differentiate now that OpenAI provides its own SIP interface.
“There are quite some startups who only provide an interface… They are in hot water now,” Granig noted on LinkedIn.
That said, specialists in complex integrations — particularly tool-calling agents — remain somewhat insulated. Still, differentiation just got harder in a crowded market.
Inside the gpt-realtime Model
Unlike older pipelines that stitched together transcription, language, and text-to-speech models, gpt-realtime unifies these tasks. That brings faster responses, more natural audio, and the ability to capture human cues like laughter or sighs.
The model can also adjust tone, pace, and style — even roleplaying characters — while handling messy inputs such as unclear audio and long alphanumeric strings. These subtleties are particularly valuable in contact center scenarios.
The Cost Factor
Despite the technical leap, pricing may slow adoption. At $32 per million audio input tokens and $64 per million output tokens, it runs roughly four times more expensive than traditional chained pipelines, according to Alex Levin, CEO at Regal.
Levin also raised concerns about control, noting that startups lose flexibility when they can’t swap voices or add guardrails at different stages of the interaction.
Industry Response: T-Mobile’s Early Bet
T-Mobile has been piloting OpenAI’s technology for months and is now leaning on gpt-realtime to reinvent customer conversations.
In a demo, the AI assistant helped a customer find a phone under $300, check satellite compatibility, and confirm plan eligibility — all while sounding strikingly human.
Julianne Roberson, Director of AI at T-Mobile, said the model “feels more human,” tracking unpredictable conversations and recognizing emotional cues.
This fits T-Mobile’s broader mission of delivering “expert-level service everywhere” through AI, a vision aligned with Sam Altman’s bold prediction that human customer service may soon become obsolete.
The Bigger Picture
OpenAI’s expansion into real-time voice signals a new front in the AI race. If enterprises adopt the model at scale, customer support — one of the most labor-intensive and expensive business functions — could see sweeping automation.
Yet the winners may not only be large enterprises. Developers and midsize firms could benefit from new low-friction tools, provided they can absorb the cost. For startups without a defensible moat, however, survival will mean pivoting fast.
What Happens Next
Industry watchers will be looking closely at adoption metrics over the coming months. Will enterprises pay a premium for smoother, more natural AI calls? Or will cost and control concerns keep the chained pipeline approach alive?
Either way, the pressure is on. For many conversational AI startups, the time to prove value — or pivot — is now.
Conclusion
OpenAI’s release of gpt-realtime and its Realtime API marks a turning point for voice AI. Enterprises see new opportunities, startups see new threats, and customers may soon be speaking to AI agents that sound more human than ever.