We are releasing a classifier capable of distinguishing between AI-written and human-written text.
We have trained a classifier to distinguish between human-written text and AI-written text from a variety of providers. While it is impossible to reliably detect all AI-written text, we believe good classifiers can inform mitigations of false claims that AI-generated text was human-written: for example, running automated disinformation campaigns, using AI tools for academic dishonesty and positioning an AI chatbot as a human being.
Our classifier is not totally reliable. In our assessments of a “challenge set” of English texts, our classifier correctly identifies 26% of AI-written text (true positives) as “probably AI-written,” while incorrectly labeling human-written text as 9 % written by AI. time (false positives). The reliability of our classifier generally improves as the length of the input text increases. Compared to our previously published classifierthis new classifier is significantly more reliable on text from newer AI systems.
We are making this classifier publicly available to get feedback on whether imperfect tools like this are useful. Our work on detecting AI-generated text will continue, and we look forward to sharing improved methods in the future.
Try our free work-in-progress classifier for yourself:
limitations
Our classifier has a number of important limitations. It should not be used as a primary decision-making tool.but as a complement to other methods to determine the source of a text.
- The classifier is very unreliable on short texts (less than 1,000 characters). Even the longest texts are sometimes mislabeled by the classifier.
- Sometimes our classifier will incorrectly but safely label human-written text as AI-written.
- We recommend using the classifier for English text only. It works significantly worse in other languages and is unreliable in code.
- Text that is highly predictable cannot be reliably identified. For example, it is impossible to predict whether a list of the first 1,000 prime numbers was written by AI or by humans, because the correct answer is always the same.
- AI typed text can be edited to bypass the classifier. Classifiers like ours can be updated and retrained based on successful attacks, but it’s not clear if detection has a long-term advantage.
- Neural network-based classifiers are known to be poorly calibrated outside of their training data. For inputs that are very different from the text in our training set, the classifier is sometimes very confident in an incorrect prediction.
Training the classifier
Our classifier is a language model fitted on a dataset of pairs of human written text and AI written text on the same topic. We collect this data set from a variety of sources we believe to be written by humans, such as pre-training data and human demos in prompts submitted to InstructGPT. We split each text into a notice and a response. On these prompts, we generate responses from a variety of different language models trained by us and other organizations. For our web application, we adjust the confidence threshold to keep the false positive rate low; in other words, we only mark text as probably AI-written if the classifier is very confident.
Impact on Educators and Request for Input
We recognize that identifying AI-written text has been a major point of discussion among educators, and equally important is recognizing the limits and impacts of AI-generated text classifiers in the classroom. We have developed a preliminary resource on using ChatGPT for educators, outlining some of the uses and associated limitations and considerations. While this resource is focused on educators, we hope our classifier and associated classifier tools will have an impact on journalists, misinformation and disinformation researchers, and other groups.
We’re engaging with educators across the US to learn what they’re seeing in their classrooms and discuss the capabilities and limitations of ChatGPT, and we’ll continue to expand our reach as we learn. These are important conversations to have as part of our mission to implement large language models safely, in direct contact with affected communities.
If you are directly affected by these issues (including, but not limited to, teachers, administrators, parents, students, and educational service providers), please provide us with feedback using this form. Direct feedback on the preliminary resource is helpful, and we also welcome any resources that educators are developing or have found useful (eg, course guidelines, honor code and policy updates, interactive tools, AI literacy programs).