ChatGPT has revolutionized the ability to easily produce a wide range of fluid text on a wide range of topics. But how good are they really? Language models are prone to factual errors and hallucinations. This allows readers to know whether such tools have been used to write news articles or other informative texts when deciding whether or not to trust a source. The advancement of these models has also raised concerns about the authenticity and originality of the text. Many educational institutions have also restricted the use of ChatGPT because the content is easy to produce.
LLMs like ChatGPT generate responses based on patterns and information in the vast amount of text they were trained on. It does not reproduce answers word for word, but rather generates new content by predicting and understanding the most appropriate continuation for a given entry. However, reactions can leverage and synthesize information from their training data, creating similarities to existing content. It is important to note that LLMs aim for originality and precision; It is not infallible. Users should exercise discretion and not rely solely on ai-generated content to make critical decisions or situations requiring expert advice.
There are many detection frameworks, such as DetectGPT and GPTZero, to detect whether an LLM has generated the content. However, the performance of these frameworks fails on data sets that were not originally tested. Researchers from the University of California present Ghostbusters. It is a detection method based on structured search and linear classification.
Ghostbuster uses a three-stage training process called probability calculation, feature selection, and classifier training. First, it converts each document into a series of vectors by calculating per-token probabilities under a series of language models. It then selects features by executing a search procedure structured in a space of vector and scalar functions that combine these probabilities by defining a set of operations that combine these features and perform feature selection. Finally, it trains a simple classifier on the best probability-based features and some additional manually selected features.
Ghostbuster classifiers are trained on combinations of probability-based features chosen through structured search and seven additional features based on word length and highest token probabilities. These other features are intended to incorporate qualitative heuristics observed on ai-generated text.
Ghostbuster’s performance improvements over previous models are robust with respect to the similarity of the training and testing data sets. Ghostbuster averaged 97.0 F1 in all conditions and outperformed DetectGPT by 39.6 F1 and GPTZero by 7.5 F1. Ghostbuster outperformed RoBERTa’s baseline in all domains except out-of-domain creative writing, and RoBERTa performed much worse out-of-domain. The F1 score is a commonly used metric to evaluate the performance of a classification model. It is a measure that combines precision and recall into a single value and is particularly useful when dealing with imbalanced data sets.
Review the Paper and blog article. All credit for this research goes to the researchers of this project. Also, don’t forget to join. our 33k+ ML SubReddit, 41k+ Facebook community, Discord channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you’ll love our newsletter.
Arshad is an intern at MarktechPost. He is currently pursuing his international career. Master’s degree in Physics from the Indian Institute of technology Kharagpur. Understanding things down to the fundamental level leads to new discoveries that lead to the advancement of technology. He is passionate about understanding nature fundamentally with the help of tools such as mathematical models, machine learning models, and artificial intelligence.
<!– ai CONTENT END 2 –>