Google cooks up its ‘most intelligent’ AI model to date

Gemini 2.5 is being hailed by Google DeepMind as its “most intelligent AI model” to date.

The first model from this latest generation is an experimental version of Gemini 2.5 Pro, which DeepMind says has achieved state-of-the-art results across a wide range of benchmarks.

According to Koray Kavukcuoglu, CTO of Google DeepMind, the Gemini 2.5 models are “thinking models”. This signifies their capability to reason through their thoughts before generating a response, leading to enhanced performance and improved accuracy.

The capacity for “reasoning” extends beyond mere classification and prediction, Kavukcuoglu explains. It encompasses the system’s ability to analyse information, deduce logical conclusions, incorporate context and nuance, and ultimately, make informed decisions.

DeepMind has been exploring methods to enhance AI’s intelligence and reasoning capabilities for some time, employing techniques such as reinforcement learning and chain-of-thought prompting. This groundwork led to the recent introduction of their first thinking model, Gemini 2.0 Flash Thinking.

“Now, with Gemini 2.5,” says Kavukcuoglu, “we’ve achieved a new level of performance by combining a significantly enhanced base model with improved post-training.”

Google plans to integrate these thinking capabilities directly into all of its future models—enabling them to tackle more complex problems and support more capable, context-aware agents.

Gemini 2.5 Pro secures the LMArena leaderboard top spot

Gemini 2.5 Pro Experimental is positioned as DeepMind’s most advanced model for handling intricate tasks. As of writing, it has secured the top spot on the LMArena leaderboard – a key metric for assessing human preferences – by a significant margin, demonstrating a highly capable model with a high-quality style:

Gemini 2.5 is a ‘pro’ at maths, science, coding, and reasoning

Gemini 2.5 Pro has demonstrated state-of-the-art performance across various benchmarks that demand advanced reasoning.

Notably, it leads in maths and science benchmarks – such as GPQA and AIME 2025 – without relying on test-time techniques that increase costs, like majority voting. It also achieved a state-of-the-art score of 18.8% on Humanity’s Last Exam, a dataset designed by subject matter experts to evaluate the human frontier of knowledge and reasoning.

DeepMind has placed significant emphasis on coding performance, and Gemini 2.5 represents a substantial leap forward compared to its predecessor, 2.0, with further improvements in the pipeline. 2.5 Pro excels in creating visually compelling web applications and agentic code applications, as well as code transformation and editing.

On SWE-Bench Verified, the industry standard for agentic code evaluations, Gemini 2.5 Pro achieved a score of 63.8% using a custom agent setup. The model’s reasoning capabilities also enable it to create a video game by generating executable code from a single-line prompt.

Building on its predecessors’ strengths

Gemini 2.5 builds upon the core strengths of earlier Gemini models, including native multimodality and a long context window. 2.5 Pro launches with a one million token context window, with plans to expand this to two million tokens soon. This enables the model to comprehend vast datasets and handle complex problems from diverse information sources, spanning text, audio, images, video, and even entire code repositories.

Developers and enterprises can now begin experimenting with Gemini 2.5 Pro in Google AI Studio. Gemini Advanced users can also access it via the model dropdown on desktop and mobile platforms. The model will be rolled out on Vertex AI in the coming weeks.

Google DeepMind encourages users to provide feedback, which will be used to further enhance Gemini’s capabilities.

(Photo by Anshita Nair)

See also: DeepSeek V3-0324 tops non-reasoning AI models in open-source first

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Read the full article here

Popular Post

From Android 1.0 to Android 16: How Google’s mobile OS has evolved since 2008

How to watch the Arsenal vs PSV UEFA Champions League game online

Your next Android bargain? Major Motorola leak teases details of multiple 2025 phones – including the Edge 60 series

NYT Mini Crossword today: puzzle answers for Tuesday, March 11

Google cooks up its ‘most intelligent’ AI model to date

Gemini 2.5 Pro secures the LMArena leaderboard top spot

Gemini 2.5 is a ‘pro’ at maths, science, coding, and reasoning

Building on its predecessors’ strengths

Leave a Reply Cancel reply

Stay Connected

Must Read

From Android 1.0 to Android 16: How Google’s mobile OS has evolved since 2008

All Fortnite Star Wars weapons and how to get them

Netflix is getting a new adult animation series by the BoJack Horseman creator and I’m counting down the days

Maxtang SXRL-20 mini PC review

It might be a while longer before you can easily cancel subscriptions

Overwatch developers have formed their own union to fight for a better industry

Microsoft’s 12-inch Surface Pro has finally given me hope for Windows on tablets

Pope Leo XIV names AI one of the reasons for his papal name

You Might also Like

Best 3 internal developer portals of 2025

4 signals AI will continue to be a narrative in 2025

Microsoft reveals $4 Billion in thwarted fraud

Machines Can See 2025 – Dubai AI event

Gemini 2.5 Pro secures the LMArena leaderboard top spot

Gemini 2.5 is a ‘pro’ at maths, science, coding, and reasoning

Building on its predecessors’ strengths

Leave a Reply Cancel reply

Stay Connected

Must Read

Join Our Community