Reddit sues Anthropic for scraping user data to train AI

News Room

Reddit is taking Anthropic to court, accusing the artificial intelligence company of pulling user content from the platform without permission and using it to train its Claude AI models. The lawsuit, filed in a California state court, claims Anthropic made more than 100,000 unauthorised requests to Reddit’s servers, even after publicly stating that it had stopped.

The case is built around Reddit’s claim that Anthropic ignored both technical restrictions and its terms of service. According to the complaint, Anthropic bypassed protections like the site’s robots.txt file, which is supposed to prevent automated scraping. Reddit also accuses Anthropic of violating user privacy by collecting and using personal posts—including deleted content—for commercial purposes.

Reddit says it offers structured access to its data through licensing agreements with companies such as OpenAI and Google. These deals include conditions around content use, privacy safeguards, and data deletion. According to the platform, Anthropic declined to pursue a formal agreement and instead scraped the site directly, avoiding licensing fees and skipping user protections in the process.

The lawsuit highlights a 2021 research paper co-authored by Anthropic CEO Dario Amodei, which pointed to Reddit as a rich source of training data for language models. Reddit also included examples where Claude appeared to reproduce Reddit posts nearly word for word, even echoing posts that had been deleted by users. That, the company says, shows Anthropic failed to put guardrails in place to respect user privacy or content takedowns.

Reddit is seeking financial damages and a court order that would stop Anthropic from using Reddit content in future versions of its models.

Anthropic has responded, claiming it disagrees with the claims and plans to defend itself. However, this is not the first time the corporation has come under legal pressure over how it collects training data.

In August 2024, a group of authors filed a class-action lawsuit accusing Anthropic of using their copyrighted work without permission. They claimed that the firm trained its models on books and other written materials without their consent and then requested compensation for using their content.

A similar case from October 2023 involved Universal Music Group and other publishers. They sued Anthropic over claims that its Claude chatbot was reproducing copyrighted song lyrics. The music companies argued that this use violated their intellectual property rights and asked the court to block further use of their lyrics.

Unlike those lawsuits, Reddit’s case doesn’t focus on copyright. Instead, it centres on breach of contract and unfair competition. Reddit’s argument is that the data taken from its site isn’t just public—it’s governed by terms that Anthropic knowingly ignored. That distinction could make the case an important one for other platforms that host user content but want to control how it’s used in commercial AI systems.

Reddit also accuses Anthropic of misleading the public. The lawsuit points to public statements from Anthropic claiming it respects scraping rules and values user privacy, which Reddit says were contradicted by the company’s actions.

“For its part, despite what its marketing material says, Anthropic does not care about Reddit’s rules or users,” the lawsuit reads. “It believes it is entitled to take whatever content it wants and use that content however it desires, with impunity.”

After the lawsuit was filed, Reddit’s stock rose nearly 67%, a sign that investors supported the move. The outcome of the case could set a precedent for how companies strike a balance between open internet content and the rights of users and content owners.

As more AI firms rely on large volumes of online data, the legal and ethical questions around scraping are getting harder to ignore. Reddit’s case adds to the growing list of lawsuits shaping how this next wave of AI development unfolds.

(Photo by Brett Jordan)

See also: Ethics in automation: Addressing bias and compliance in AI

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Read the full article here

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *