Log In

Google I/O 2025: Gemini 2.5 AI Models Upgraded With Deep Think Mode, Native Audio Output

Published 12 hours ago3 minute read

Google showcased several new features for the Gemini 2.5 family of artificial intelligence (AI) models at the Google I/O 2025 on Tuesday. The Mountain View-based tech giant introduced an enhanced reasoning mode dubbed Deep Think, which is powered by the Gemini 2.5 Pro model. It also unveiled a new, natural and human-like speech called Native Audio Output, which will be available via the Live application programming interface (API). Additionally, the company is also bringing thought summaries and thinking budgets with the latest Gemini models for developers.

In a blog post, the tech giant detailed all the new capabilities and features that it will be shipping to the Gemini 2.5 AI model series throughout the next few months. Earlier this month, Google released an updated version of the Gemini 2.5 Pro with improved coding capabilities. The updated model also ranked in the top position on the WebDev Arena and LMArena leaderboards.

Now, Google is improving the AI model further with the Deep Think mode. The new reasoning mode allows Gemini 2.5 Pro to consider multiple hypotheses before responding. The company says it uses a different research technique compared to the Thinking versions of the older models.

Based on internal testing, the tech giant shared the reasoning mode's benchmark scores across different parameters. Notably, the Gemini 2.5 Pro Deep Think is claimed to score 49.4 percent on the 2025 UAMO, one of the toughest mathematics benchmark tests. It also scores competitively on LiveCodeBench v6 and MMMU.

Deep Think is currently under testing, and Google says it is conducting safety evaluations and getting input from safety experts. Currently, the reasoning mode is only available to trusted testers via the Gemini API. There is no word on its release date.

Google also announced adding new capabilities to the Gemini 2.5 Flash model, which was released just a month ago. The company said the AI model's key benchmarks for reasoning, multimodality, code and long context have been improved. Additionally, it is also more efficient and uses 20-30 percent fewer tokens, the company claimed.

This new version of Gemini 2.5 Flash is currently available in preview to developers via Google AI Studio. Enterprises can access it via the Vertex AI platform, and individuals can find it in the Gemini app. Notably, the model will be widely available for production in June.

Developers accessing the Live API will now get a new feature with the Gemini 2.5 series of AI models. The company is introducing a preview version of Native Audio Output, which can generate speech in a more expressive and human-like manner. Google said the feature allows users to control the tone, accent, and style of speech generated.

The early version of the capability comes with three features. First is Affective Dialogue, where the AI model can detect emotions in the user's voice and respond accordingly. The second is Proactive Audio, which enables the model to ignore background conversations and only respond when it is spoken to. And finally, Thinking, which lets the speech generation leverage Gemini's thinking capabilities to verbally answer complex queries.

Apart from this, the 2.5 Pro and Flash models in the Gemini API and in Vertex AI will also show thought summaries. These are essentially the model's raw thought process, which were previously only visible in Gemini's reasoning models. Now, Google will show a detailed summary including headers, key details and information about model actions with every response.

In the coming weeks, developers will also be able to use thinking budgets with the Gemini 2.5 Pro. This will allow them to decide how many tokens a model consumes before responding. Finally, Project Mariner's Computer Use agentic function will also be added to the API and in Vertex AI soon.

Origin:
publisher logo
Gadgets 360
Loading...
Loading...
Loading...

You may also like...