OpenAI、gpt-realtimeとRealtime APIの大幅アップデートを発表 Introducing gpt-realtime and Realtime API updates

OpenAI Blog · openai.com · 2025/08/28 19:00 · 9mo ago · 📖 1 min

AI 3 行サマリ

OpenAIが本番向け音声合成モデルgpt-realtimeとRealtime API正式版を公開。
リモートMCPサーバー、画像入力、SIP電話対応などを追加。

English summary

We’re releasing a more advanced speech-to-speech model and new API capabilities including MCP server support, image input, and SIP phone calling support.

OpenAIは2025年8月28日、本番環境向けの新しい音声合成（speech-to-speech）モデル「gpt-realtime」と、Realtime APIの正式版（GA）を発表した。これまでベータ提供だったRealtime APIが正式リリースとなり、開発者はより安定した環境で音声エージェントを構築できるようになる。

新機能として、リモートMCPサーバーとの連携、画像入力のサポート、SIP電話接続のサポートが追加された。これらにより、電話システムへの統合や視覚情報を組み合わせたマルチモーダルな音声体験の構築が可能になるとみられる。詳細な料金体系やモデルの性能指標については、OpenAIの公式ページで確認することを推奨する。

On August 28, 2025, OpenAI announced gpt-realtime, a more advanced speech-to-speech model designed for production use, along with a generally available release of the Realtime API. The move signals OpenAI's push to make low-latency voice interactions a first-class offering for developers building voice agents at scale.

The updated API introduces several notable capabilities: support for remote MCP servers, image input, and SIP phone calling. MCP server integration suggests developers can connect voice agents to a broader ecosystem of tools and data sources, while SIP support opens pathways for integration with traditional telephony infrastructure. Image input hints at multimodal conversations where visual context can inform real-time speech responses.

Pricing details, specific model benchmarks, and regional availability have not been confirmed in the collected context, so developers should consult the official OpenAI announcement for full technical specifications before planning production deployments.