You have probably heard or even used ChatGPT in this point. OpenAI’s new magic tool is there to answer your questions, help you write documents, write executable code, give you recipes with the ingredients you have, and even more, all with human-like ability.
ChatGPT is probably the most famous example of extensive language models (LLMs). These models are trained on large-scale data sets and can understand and generate text responses to given requests. When we mean large data sets, we mean it.
As these LLMs become more advanced, we may need a way to identify if they or a human has written something. “But why?” you could ask. Although these tools are extremely useful for increasing our abilities, we might not expect everyone to use them innocently; there could be malicious use cases where we cannot allow them to operate.
For example, one can use it to generate fake news and ChatGPT can be really convincing. Imagine your Twitter feed is flooded with LLM bots spreading the same misinformation, but they all sound realistic. This could be a big problem. Also, academic writing assignments are no longer safe. How can you make sure if the student wrote the article or an LLM? In fact, how can you make sure ChatGPT hasn’t written this very article? (PS: it’s not )
On the other hand, LLMs are trained with the data obtained from the Internet. What if most of our data is AI-generated synthetic content? That would lower the quality of LLMs, since synthetic data is often inferior to human-generated content.
We can keep talking about the importance of detecting AI-generated content, but let’s stop here and think about how it can be done. Since we’re talking about LLM, why not ask ChatGPT and what does it recommend for determining AI-generated text?
Thanks to ChatGPT for their honest answer, but neither of these approaches can give us great confidence in detection.
Fake content is not a new topic. We’ve had this problem for years with the important stuff. For example, counterfeit money was a big problem, but today we can be 99% sure that our money is legal and legitimate. But how? The answer is hidden inside the money. You’ve probably noticed those little numbers and symbols that are only visible under certain conditions. These are watermarks; it is like a hidden signature embedded there by the mint indicating its originality.
Well, since we have a method that has proven useful for multiple use cases, why not take it and apply it to AI-generated content? This was the very idea that the authors of this article had, and they came up with a convenient solution.
They study the watermark of the LLM output. The watermark is a hidden pattern that is unlikely to be written by human writers. It’s hidden in a way that humans can’t detect, but it ensures that an LLM types the text. The watermark algorithm can be public, so everyone else can use it to check if a certain LLm writes the text, or it can be private so it’s only visible to LLM editors.
Furthermore, the proposed watermark can be integrated into any LLM without the need for retraining. Also, the watermark can be detected from a small part of the generated text, which prevents someone from generating a long text but using parts of it to avoid detection. Also, if you want to remove the proposed watermark, you must significantly modify the text. Minor modifications would not avoid detection.
The proposed watermarking algorithm works well but it is not perfect as they mention certain types of attacks. For example, the LLM can be asked to insert certain emojis after each word and then remove them from the generated text. In this way, the watermark algorithm can be avoided.
The rise of successful LLMs makes most tasks easier, but they also pose certain threats. This document proposed a method to identify LLM-generated text using a watermark.
review the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 13k+ ML SubReddit, discord channel, and electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Ekrem Çetinkaya received his B.Sc. in 2018 and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. She wrote her M.Sc. thesis on denoising images using deep convolutional networks. She is currently pursuing a PhD. She graduated from the University of Klagenfurt, Austria, and working as a researcher in the ATHENA project. Her research interests include deep learning, computer vision, and multimedia networks.