How Wikipedia is fighting AI slop content

News Room

With the rise of AI writing tools, Wikipedia editors have had to deal with an onslaught of AI-generated content filled with false information and phony citations. Already, the community of Wikipedia volunteers has mobilized to fight back against AI slop, something Wikimedia Foundation product director Marshall Miller likens to a sort of “immune system” response.

“They are vigilant to make sure that the content stays neutral and reliable,” Miller says. “As the internet changes, as things like AI appear, that’s the immune system adapting to some kind of new challenge and figuring out how to process it.”

One way Wikipedians are sloshing through the muck is with the “speedy deletion” of poorly written articles, as reported earlier by 404 Media. A Wikipedia reviewer who expressed support for the rule said they are “flooded non-stop with horrendous drafts.” They add that the speedy removal “would greatly help efforts to combat it and save countless hours picking up the junk AI leaves behind.” Another says the “lies and fake references” inside AI outputs take “an incredible amount of experienced editor time to clean up.”

Typically, articles flagged for removal on Wikipedia enter a seven-day discussion period during which community members determine whether the site should delete the article. The newly adopted rule will allow Wikipedia administrators to circumvent these discussions if an article is clearly AI-generated and wasn’t reviewed by the person submitting it. That means looking for three main signs:

  • Writing directed toward the user, such as “Here is your Wikipedia article on
,” or “I hope that helps!”
  • “Nonsensical” citations, including those with incorrect references to authors or publications.
  • Non-existent references, like dead links, ISBNs with invalid checksums, or unresolvable DOIs.

These aren’t the only signs of AI Wikipedians are looking out for, though. As part of the WikiProject AI Cleanup, which aims to tackle an “increasing problem of unsourced, poorly written AI-generated content,” editors put together a list of phrases and formatting characteristics that chatbot-written articles typically exhibit.

The list goes beyond calling out the excessive use of em dashes (“—”) that have become associated with AI chatbots, and even includes an overuse of certain conjunctions, like “moreover,” as well as promotional language, such as describing something as “breathtaking.” There are other formatting issues the page advises Wikipedians to look out for, too, including curly quotation marks and apostrophes instead of straight ones.

However, Wikipedia’s speedy removal page notes that these characteristics “should not, on their own, serve as the sole basis” for determining that something has been written by AI, making it subject to removal. The speedy deletion policy isn’t just for AI-generated slop content, either. The online encyclopedia also allows for the quick removal of pages that harass their subject, contain hoaxes or vandalism, or espouse “incoherent text or gibberish,” among other things.

The Wikimedia Foundation, which hosts the encyclopedia but doesn’t have a hand in creating policies for the website, hasn’t always seen eye-to-eye with its community of volunteers about AI. In June, the Wikimedia Foundation paused an experiment that put AI-generated summaries at the top of articles after facing backlash from the community.

Despite varying viewpoints about AI across the Wikipedia community, the Wikimedia Foundation isn’t against using it as long as it results in accurate, high-quality writing.

“It’s a double-edged sword,” Miller says. “It’s causing people to be able to generate lower quality content at higher volumes, but AI can also potentially be a tool to help volunteers do their work, if we do it right and work with them to figure out the right ways to apply it.” For example, the Wikimedia Foundation already uses AI to help identify article revisions containing vandalism, and its recently-published AI strategy includes supporting editors with AI tools that will help them automate “repetitive tasks” and translation.

The Wikimedia Foundation is also actively developing a non-AI-powered tool called Edit Check that’s geared toward helping new contributors fall in line with its policies and writing guidelines. Eventually, it might help ease the burden of unreviewed AI-generated submissions, too. Right now, Edit Check can remind writers to add citations if they’ve written a large amount of text without one, as well as check their tone to ensure that writers stay neutral.

The Wikimedia Foundation is also working on adding a “Paste Check” to the tool, which will ask users who’ve pasted a large chunk of text into an article whether they’ve actually written it. Contributors have submitted several ideas to help the Wikimedia Foundation build upon the tool as well, with one user suggesting asking suspected AI authors to specify how much was generated by a chatbot.

“We’re following along with our communities on what they do and what they find productive,” Miller says. “For now, our focus with using machine learning in the editing context is more on helping people make constructive edits, and also on helping people who are patrolling edits pay attention to the right ones.”

Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.


Read the full article here

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *