With recent technological advances, Large Language Models (LLMs) such as GPT-3 and PaLM have exhibited remarkable generation capabilities in a wide range of domains such as education, content creation, healthcare, research, etc. For example, these large language models are especially useful for writers to help them improve their writing style, and for budding developers to help them generate boilerplate code, etc. Additionally, combined with the availability of various third-party APIs, the widespread adoption of LLM has only increased in various consumer-facing systems, such as those for students and healthcare systems used by hospitals. However, in such scenarios, the security of these systems becomes a fundamental issue as people trust these systems with sensitive personal information. This calls for the need to get a clearer picture of the different capabilities and limitations of LLMs.
However, most of the previous research has focused on making LLMs more powerful by employing more advanced and sophisticated architectures. Although this research has significantly transcended the NLP community, it has also left aside the safety of these systems. On this front, a team of postdoctoral students from Princeton University and Georgia Tech collaborated with researchers at the Allen Institute for AI (A2I) to close this gap by conducting a toxicity analysis of OpenAI’s revolutionary AI chatbot. , ChatGPT. The researchers tested more than half a million generations of ChatGPT for toxicity, and their investigations revealed that when ChatGPT’s system parameter was configured in such a way that a person was assigned to it, its toxicity increased several-fold for a wide range of subjects. . For example, when ChatGPT’s personality is set to that of the boxer “Muhammad Ali”, his toxicity is increased by almost three times compared to his default setting. This is particularly alarming as ChatGPT is currently being used as the foundation to build various other technologies that can then generate the same level of toxicity with such system-level modifications. Therefore, work by A2I researchers and university students is focused on gaining deeper insight into this toxicity in ChatGPT generations when assigned to different people.
The ChatGPT API provides a function that allows the user to assign a person by setting their system parameter such that the person sets the tone for the rest of the conversation by influencing the way ChatGPT converses. For their use case, the researchers selected a list of 90 people from different backgrounds and countries, such as businessmen, politicians, journalists, etc. These people were assigned to ChatGPT to analyze their responses about approximately 128 critical entities such as gender, religion, profession. , etc. The team also asked ChatGPT to continue with certain incomplete sentences in these entities to gather more information. The final findings showed that assigning a person to ChatGPT can increase their toxicity by up to six times, as ChatGPT frequently produces harsh results and indulges in negative stereotypes and beliefs.
The team’s research showed that the toxicity of the results varied significantly depending on the person ChatGPT was given, which the researchers theorized is due to the person’s understanding of ChatGPT based on their training data. . One finding, for example, suggested that journalists are twice as toxic as entrepreneurs, although this is not necessarily the case in practice. The study also showed that specific populations and entities are attacked more frequently (nearly three times more) than others, demonstrating the inherently discriminatory behavior of the model. For example, toxicity varies based on a person’s gender and is approximately 50% higher than toxicity based on race. These fluctuating trends could be detrimental to users and derogatory to the individual in question. In addition, malicious users can create technologies in ChatGPT to generate content that could harm an unsuspecting audience.
This study’s analysis of the toxicity of ChatGPT revealed mainly three things: the model can be significantly more toxic when characters are assigned (up to six times more toxic than default), the toxicity of the model varies greatly depending on the identity of the character, with ChatGPT’s opinion on the person who plays an important role; and ChatGPT can discriminatoryly target specific entities by being more toxic when creating content about them. The researchers also noted that although ChatGPT was the LLM they used for their experiment, their methodology could be extended to any other LLM. The team hopes that their work will motivate the AI community to develop technologies that provide ethical, safe and trustworthy AI systems.
review the Paper and Reference article. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 18k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
🚀 Check out 100 AI tools at AI Tools Club
Khushboo Gupta is a consulting intern at MarktechPost. He is currently pursuing his B.Tech at the Indian Institute of Technology (IIT), Goa. She is passionate about the fields of machine learning, natural language processing, and web development. She likes to learn more about the technical field by participating in various challenges.