Gemini音频模型全面升级:实时语音代理进入生产级可用时代

Improved Gemini audio models for powerful voice experiences

来源 Google DeepMind Blog 日期 英语原文

谷歌升级Gemini 2.5 Flash原生音频模型,显著提升函数调用、指令理解和对话流畅度。实时语音翻译功能正在美国、墨西哥和印度的安卓版谷歌翻译应用中推送测试,企业应用场景包括Shopify Sidekick和美威抵押贷款的AI助理。

Google enhanced Gemini 2.5 Flash Native Audio for live voice agents. Key improvements: sharper function calling, robust instruction following, smoother conversations. Live speech translation now rolling out in Google Translate app beta on Android in US, Mexico, and India. Use cases: Shopify Sidekick (merchants), United Wholesale Mortgage Mia (loan generation - over 14,000 loans generated). Available via Gemini API in Google AI Studio. Voice AI is becoming production-ready for customer-facing applications.

对 AI 行业的影响

语音AI长期以来受困于’玩具级’体验,难以支撑真实商业场景。Gemini音频模型在函数调用精度上的突破,意味着AI可以在通话过程中实时查询信息、执行操作,而不只是简单对话。Shopify和房贷领域已出现规模化商用案例,标志语音AI正在跨越从演示到生产的关键门槛。


原文参考

来源:Google DeepMind Blog · 2025-12-12