Researchers’ position (their perspectives shaped by their own experience, identity, culture, and background) influences their design decisions as they develop NLP data sets and models.
Latent design choices and the position of the researcher are two sources of design bias in the production of data sets and models. This leads to discrepancies in the performance of data sets and models for different populations. However, by imposing the standards of one group on the rest of the world, they can help maintain systemic inequalities. The difficulty arises because of the wide variety of design decisions that need to be made, and only a subset of these decisions can be recorded when building data sets and models. Additionally, many models widely used in production are not exposed outside of the APIs, making direct characterization of design biases difficult.
Recent research from the University of Washington, Carnegie Mellon University, and the Allen Institute for AI introduces NLPositionality, a paradigm for describing the position and design biases of data sets and natural language processing (NLP) models. The researchers recruit a global community of volunteers from diverse cultural and linguistic backgrounds to annotate a sample data set. They then measure biases in the design by contrasting different identities and contexts to see which ones are more in line with the original data set labels or model predictions.
NLPositionality has three benefits over other methods (such as paid crowdsourcing or in-lab experiments):
- Compared to other crowdsourcing platforms and conventional lab studies, LabintheWild has a more diverse participant population.
- Instead of relying on monetary compensation, this method is based on the intrinsic need of the participants to grow by expanding their self-awareness. Learning possibilities for participants are increased and data quality is improved compared to paid crowdsourcing platforms. Therefore, unlike paid one-time studies like those found in other research, this platform is free to collect new annotations and reflect more recent observations of design bias over long periods of time.
- This method does not require post hoc labels or pre-existing predictions to be applied to any data set or model.
The researchers use NLPositionality in two examples of NLP tasks known to be biased in their design: social acceptability and hate speech detection. They examine task-specific and general task-specific long language models (i.e., GPT-4) and the associated data sets and supervised models. On average, 1,096 annotators from 87 countries contributed 38 annotations per day for 16,299 annotations as of May 25, 2023. The team found that college-educated white millennials from English-speaking countries, a subset of “WEIRD” (Western, educated, industrialized, wealthy, democratic) are the best fit for the data sets and models they examine. The importance of collecting data and annotations from a wide range of sources is also highlighted by his observation that data sets show high levels of alignment with their original annotators. Their findings indicate the need to expand NLP research to include more diverse models and data sets.
review the Paper and github link. Don’t forget to join our 26k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]
🚀 Check out over 800 AI tools at AI Tools Club
Dhanshree Shenwai is a Computer Engineer and has good experience in FinTech companies covering Finance, Cards & Payments and Banking domain with strong interest in AI applications. She is enthusiastic about exploring new technologies and advancements in today’s changing world, making everyone’s life easier.