Google’s AI Overviews Produce Hundreds of Millions of Inaccurate Answers Every Day, Analysis Suggests

News Room

Google’s AI Overview feature — which provides an AI-generated summary in response to search queries — is producing hundreds of millions of incorrect answers every day, according to an analysis conducted by AI startup Oumi on behalf of The New York Times.

Using the industry standard SimpleQA benchmark, Oumi ran and evaluated 4,326 Google searches, finding that results were accurate 85% of the time with Gemini 2, and 91% of the time when AI Overviews ran on the more recent Gemini 3 model.

Given that Google handles over 5 trillion searches per year, even a 9% inaccuracy rate would equate to approximately 225 billion false or misleading summaries every year (or 616.4 million per day), assuming that AI Overviews are produced for 50% of searches.

Most accurate responses are ‘ungrounded’

The issues don’t begin and end with inaccurate answers, however, since Oumi’s analysis found that 56% of the accurate answers produced by AI Overview were “ungrounded,” in that they cited sources which didn’t actually support what they were asserting.

Perhaps surprisingly, this has become more of a problem with Gemini 3, with AI Overview producing ungrounded summaries in 37% of accurate instances when it was running off the older Gemini 2.

The feature also appears to rely on sources which some sticklers for accuracy may find questionable: of the 5,380 sources referenced by Google’s AI summaries, the second and fourth most-cited were Facebook and Reddit.

Oumi’s analysis showed AI Overviews used Facebook as a source 5% of the time in instances where its summaries were accurate, and 7% of the time when summaries were inaccurate.

Responding to these findings, Google acknowledged that AI models can make mistakes, but argued that Oumi’s analysis was based on an OpenAI-developed benchmark test that itself contains flaws and inaccurate data.

“This study has serious holes,” said spokesperson Ned Adriance. “It doesn’t reflect what people are actually searching on Google.”

Despite this argument, Google’s own test data shows that Gemini 3 — which currently powers AI Overview — provides incorrect information in 28% of queries, although it becomes more accurate when used in combination with Google’s search engine.

Most users are not doing their own research

While LLM inaccuracy is now a familiar topic to most observers, the study from Oumi and The New York Times delivers a timely reminder that we shouldn’t be too quick to trust AI-generated results.

Yet research indicates that many of us are doing just that, with a Pew Research Center survey from July 2025 finding that Google Search users who see an AI Overview are less likely to click links to websites that appear in their search results.

Users who saw an AI summary clicked on a traditional search result link in 8% of all visits, while users who did not receive such a summary were almost twice as likely to do so, clicking on a search result in 15% of cases.

What’s more, the same survey revealed that users who encountered an AI Overview clicked on a link included in that overview in only 1% of instances, indicating that few may be bothering to check the veracity of what they’re seeing.

The scale and spread of misinformation may therefore be considerable, given that Google’s own data reveals that its AI Overviews already had 2 billion monthly users back in July 2025, reaching over 200 jurisdictions and 40 languages.

Read the full article here

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *