We have all the ingredients we need to check if a piece of text is generated by ai. Here is everything we need:
- The text (sentence or paragraph) that we want to check.
- The tokenized version of this text, tokenized using the tokenizer that was used to tokenize the training data set for this model.
- The trained language model.
Using 1, 2 and 3 above, we can calculate the following:
- Probability per token as predicted by the model.
- Token Perplexity using Token Probability.
- Total perplexity throughout the entire sentence.
- The perplexity of the model on the training data set.
To check whether a text is generated by ai, we need to compare the perplexity of the sentence with the perplexity of the model scaled by a manipulation factor, alpha. If the sentence’s perplexity is greater than the model’s perplexity with the scale, then it is probably human-written text (i.e., not ai-generated). Otherwise it’s probably ai generated. The reason for this is that we expect the model not to be stumped by text that it would generate itself, so if it encounters some text that it would not generate itself, then there is reason to believe that the text is not ai generated. If the perplexity of the sentence is less than or equal to the perplexity of training the model with the scale, then it is likely that it was generated using this language model, but we cannot be very sure. This is because a human may have written that text, and it turns out that it is something that the model could have generated as well. After all, the model was trained on a large amount of human-written text, so in a sense the model represents “average human writing.”
ppx(x) in the above formula means the perplexity of the input “x”.
Next, let’s take a look at examples of human-written text versus ai-generated text.
Examples of human-written versus ai-generated text
We have written Python code that colors each token in a sentence based on its perplexity relative to the perplexity of the model. The first tile is always black if we do not consider its perplexity. Tiles that have a perplexity less than or equal to the scaled model’s perplexity are colored red, indicating that they can be generated by ai, while tiles with higher perplexity are colored green, indicating that they definitely cannot. They were generated by ai.
The numbers in square brackets before the sentence indicate the perplexity of the sentence calculated using the language model. Please note that some words are part red and part blue. This is due to the fact that we use a subword tokenizer.
Here is the code that generates the HTML above.
def get_html_for_token_perplexity(tok, sentence, tok_ppx, model_ppx):
tokens = tok.encode(sentence).tokens
ids = tok.encode(sentence).ids
cleaned_tokens = ()
for word in tokens:
m = list(map(ord, word))
m = list(map(lambda x: x if x != 288 else ord(' '), m))
m = list(map(chr, m))
m = ''.join(m)
cleaned_tokens.append(m)
#
html = (
f"<span>{cleaned_tokens(0)}</span>",
)
for ct, ppx in zip(cleaned_tokens(1:), tok_ppx):
color = "black"
if ppx.item() >= 0:
if ppx.item() <= model_ppx * 1.1:
color = "red"
else:
color = "green"
#
#
html.append(f"<span style='color:{color};'>{ct}</span>")
#
return "".join(html)
#
As we can see from the above examples, if a model detects some text as human-generated, it is definitely human-generated, but if it detects the text as ai-generated, there is a chance that it is not ai-generated. So why is this happening? Let’s take a look below!
False positives
Our language model is trained on MANY texts written by humans. It is generally difficult to detect if something was written (digitally) by a specific person. The model’s contributions to training include many different writing styles, probably written by a large number of people. This causes the model to learn many different writing styles and content. It is very likely that your writing style closely matches the writing style of some text the model was trained on. This is the result of false positives and the reason why the model cannot be sure that some of the text is generated by ai. However, the model can be sure that some of the text was generated by humans.
Open ai: OpenAI recently announced that it would discontinue its tools for detecting ai-generated text, citing a low accuracy rate (Source: tech.hindustantimes.com/tech/news/openai-kills-off-its-own-ai-text-detection-tool-shocking-reason-behind-it-71690364760759.html” rel=”noopener ugc nofollow” target=”_blank”>Hindustan Times).
The original version of the ai classifier tool had certain limitations and inaccuracies from the beginning. Users had to enter at least 1,000 characters of text manually, which OpenAI then analyzed to classify them as written by ai or humans. Unfortunately, the tool’s performance fell short, correctly identifying only 26 percent of ai-generated content and mislabeling human-written text as ai about 9 percent of the time.
Here is the ai-classifier-for-indicating-ai-written-text” rel=”noopener ugc nofollow” target=”_blank”>OpenAI blog post. It seems they used a different approach than mentioned in this article.
Our classifier is a language model fine-tuned to a dataset of pairs of human-written text and ai-written text on the same topic. We collected this data set from a variety of sources that we believe were written by humans, such as pre-training data and human demos on prompts submitted to InstructGPT. We divide each text into an indication and a response. From these prompts, we generate responses from a variety of different language models trained by us and other organizations. For our web application, we adjust the confidence threshold to keep the false positive rate low; In other words, we only mark the text as probably written by ai if the classifier is very confident.
GPTZero: Another popular ai generated text detection tool is GPTZero. It seems that GPTZero uses ai/topics/ChatGPT/high-perplexity-score-gpt-zero” rel=”noopener ugc nofollow” target=”_blank”>perplexity and explosion to detect ai-generated text. “Burst refers to the phenomenon in which certain words or phrases appear in bursts within a text. In other words, if a word appears once in a text, it is likely to appear again very close by.” (ai/topics/ChatGPT/high-perplexity-score-gpt-zero” rel=”noopener ugc nofollow” target=”_blank”>fountain).
GPTZero claims to have a very high success rate. According to the GPTZero FAQ, “With a threshold of 0.88, 85% of ai documents are classified as ai and 99% of human documents are classified as human.”
The generality of this approach.
The approach mentioned in this article does not generalize well. What we mean by this is that if you have 3 language models, for example, GPT3, GPT3.5 and GPT4, then you should run the input text on all 3 models and do a perplexity check on all of them to see if the text was generated by any of them. This is because each model generates text slightly differently and they all need to evaluate the text independently to see if any of them could have generated the text.
With the proliferation of large language models in the world starting in August 2023, it seems unlikely that a fragment of text can be proven to have originated from any of the world’s language models.
In fact, new models are trained every day and trying to keep up with this rapid progress seems difficult at best.
The following example shows the result of asking our model to predict whether the sentences generated by ChatGPT are generated by ai or not. As you can see, the results are mixed.
There are many reasons why this can happen.
- Train corpus size: Our model was trained with very little text, while ChatGPT was trained with terabytes of text.
- Data distribution: Our model is trained on a different data distribution compared to ChatGPT.
- Fine tuning: Our model is just a GPT model, while ChatGPT was tuned for chat-like responses, causing it to output text in a slightly different tone. If you had a model that generates legal text or medical advice, our model would also perform poorly on the text generated by those models.
- Model size: Our model is very small (less than 100 million parameters compared to > 200 billion parameters for ChatGPT-like models).
It is clear that we need a better approach if we hope to provide a reasonably high quality result for checking whether any text is generated by ai.
Next, let’s take a look at some misinformation on this topic circulating on the Internet.