Google announced via a post on X (formerly Twitter) on Wednesday that SynthID is now available to anybody who wants to try it. The authentication system for AI-generated content embeds imperceptible watermarks into generated images, video, and text, enabling users to verify whether a piece of content was made by humans or machines.
“We’re open-sourcing our SynthID Text watermarking tool,” the company wrote. “Available freely to developers and businesses, it will help them identify their AI-generated content.”
SynthID debuted in 2023 as a means to watermark AI-generated images, audio, and video. It was initially integrated into Imagen, and the company subsequently announced its incorporation into the Gemini chatbot this past May at I/O 2024.
The system works by encoding tokens — those are the foundational chunks of data (be it a single character, word, or part of a phrase) that a generative AI uses to understand the prompt and predict the next word in its reply — with imperceptible watermarks during the text generation process. It does so, according to a DeepMind blog from May, by “introducing additional information in the token distribution at the point of generation by modulating the likelihood of tokens being generated.”
By comparing the model’s word choices along with its “adjusted probability scores” against the expected pattern of scores for watermarked and unwatermarked text, SynthID can detect whether an AI wrote that sentence.
Here’s how SynthID watermarks AI-generated content across modalities. ↓ pic.twitter.com/CVxgP3bnt2
— Google DeepMind (@GoogleDeepMind) October 23, 2024
This process does not impact the response’s accuracy, quality, or speed, according to a study published in Nature on Wednesday, nor can it be easily bypassed. Unlike standard metadata, which can be easily stripped and erased, SynthID’s watermark reportedly remains even if the content has been cropped, edited, or otherwise modified.
“Achieving reliable and imperceptible watermarking of AI-generated text is fundamentally challenging, especially in scenarios where [large language model] outputs are near deterministic, such as factual questions or code generation tasks,” Soheil Feizi, an associate professor at the University of Maryland, told MIT Technology Review, noting that its open-source nature “allows the community to test these detectors and evaluate their robustness in different settings, helping to better understand the limitations of these techniques.”
The system is not foolproof, however. While it is resistant to tampering, SynthID’s watermarks can be removed if the text is run through a language translation app or if it’s been heavily rewritten. It is also less effective with short passages of text and in determining whether a reply based on a factual statement was generated by AI. For example, there’s only one right answer to the prompt, “what is the capital of France?” and both humans and AI will tell you that it’s Paris.
If you’d like to try SynthID yourself, it can be downloaded from Hugging Face as part of Google’s updated Responsible GenAI Toolkit.
Read the full article here