High-quality labeled data is necessary for many NLP applications, particularly for training classifiers or evaluating the effectiveness of unsupervised models. For example, scholars often seek to classify texts into various themes or conceptual categories, sift through noisy social media data to determine its relevance, or gauge its mood or position. Labeled data is necessary to provide a training set or benchmark against which results can be compared, whether supervised, semi-supervised, or unsupervised methods are employed for these tasks. Such data can be provided for high-level tasks like semantic analysis, hate speech, and sometimes more specialized targets like party ideology.
Researchers usually have to make original annotations to verify that the labels correspond to their conceptual categories. Until recently, there were only two basic approaches. Research assistants, for example, may be hired and trained as coders by researchers. Second, they can rely on freelancers working on websites like Amazon Mechanical Turk (MTurk). These two approaches are often combined, with collaborative workers augmenting the labeled data while trained annotators produce a small gold standard data set. Each tactic has its own advantages and disadvantages. Training annotators often create high-quality data, even though their services are expensive.
However, there have been concerns about the declining quality of the MTurk data. Other platforms like CrowdFlower and FigureEight are no longer viable possibilities for academic research after they were bought by Appen, a business-focused organization. Collective employees are much more affordable and adaptable, but the quality could be better, especially for difficult activities and languages other than English. Researcher at the University of Zurich examines the potential of large language models (LLMs) for text annotation tasks, with a particular emphasis on ChatGPT, which went public in November 2022. He shows that, at a fraction of the cost of MTurk annotations, zero ChatGPT -shot ratings outperform them (ie without any additional training).
LLMs have worked very well for a number of tasks, including categorizing legislative ideas, ideological scaling, cognitive psychology problem solving, and emulating human samples for survey research. Although some research has shown that ChatGPT would be capable of performing the type of text annotation tasks they specified, to the best of their knowledge, a comprehensive evaluation has not yet been carried out. A sample of 2,382 tweets they collected for previous research is what they used for their analysis. For that project, tweets were annotated for five separate tasks: relevance, stance, themes, and two types of frame identification by trained annotators (research assistants).
They distributed the jobs to MTurk collective workers and ChatGPT zero-shot rankings, using the identical codebooks they created to train their research assistants. After that, they evaluated ChatGPT’s performance against two benchmarks: (i) its accuracy compared to crowd workers; and (ii) their agreement between coders compared to crowd workers and their trained note takers. They find that ChatGPT’s zero-shot accuracy is higher than MTurk’s for four tasks. ChatGPT outperforms MTurk and trained annotators for all functions related to coder agreement.
Furthermore, ChatGPT is much more affordable than MTurk: the five categorization jobs in ChatGPT cost approximately $68 (25,264 annotations), while the same tasks in MTurk cost $657 (12,632 annotations). Therefore, ChatGPT costs only $0.003, or a third of a penny, making it about twenty times more affordable than MTurk and providing superior quality. It is possible to annotate entire samples at this cost or build sizable training sets for supervised learning.
They tested 100,000 annotations and found that it would cost approximately $300. These findings show how ChatGPT and other LLMs can change the way researchers annotate data and change some aspects of the business models of platforms like MTurk. However, more research is required to fully understand how ChatGPT and other LLMs work in broader contexts.
review the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 17k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Information Science and Artificial Intelligence at the Indian Institute of Technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around her. She loves connecting with people and collaborating on interesting projects.
🔥 Promoted Reading: Document Processing and Intelligent Character Recognition (ICR) Innovations Over the Last Decade