In recent years, researchers have developed a great deal of interest in Question Answering (QA) related tasks as it pertains to research in natural language processing. Information Retrieval (IR) systems, also known as Retrievers, and Machine Reading Comprehension (MRC) systems (also known as Readers) make up the bulk of the QA pipeline. The pipeline input is typically a query and a large collection of documents from which the retriever extracts sections relevant to the query context. On the other hand, the reader component extracts such contexts to get an accurate response, which is then provided as the final output of the pipeline. With the advancement of finer pretrained language models and more advanced algorithms for reading and retrieval components, the field of QA research has made remarkable progress.
Although the field of quality control has advanced rapidly in recent years, there is still a lot of room for improvement. To carry out large-scale QA experiments, there is currently no centralized repository that makes it easy for researchers to train and analyze various state-of-the-art models. In order to create a comprehensive solution for QA research and with the long-term goal of democratizing QA research by providing easy replicability, a team at IBM Research AI developed a QA repository known as ‘The Master Repository for the State of the -Art Multilingual Question Answering Research and Development’ or PrimeQA. It is an open source repository that provides academics and researchers with all the necessary tools to easily and quickly create a custom QA application. Using PrimeQA, a researcher can obtain pre-trained models from various online sources and use them to run the experiments described in a paper published at the most recent NLP conference.
The creation of the PrimeQA repository took several design patterns into account, including reproducibility, customization, etc. Users can combine different approaches with their respective add-on modules to easily replicate state-of-the-art published results. For example, combining a reader with a retriever, as is done in various QA pipelines. PrimeQA also provides customization to allow researchers to extend their models according to the needs of their applications and use unique data according to the repository’s supported data formats. To make it even easier for developers to quickly deploy pre-trained models out of the box, PrimeQA also includes many reusable components. As a result, there is less need to modify code, saving time and labor. Additionally, PrimeQA models are built on top of Transformers, making it easy to integrate with Hugging Face Datasets and Model Hub.
PrimeQA is a comprehensive toolbox consisting of easy-to-use implementations of state-of-the-art retrievers and readers at the top of major QA leaderboards. You can perform training, inference, and performance evaluation of these models. In addition, several sister repositories offer tools to link different retrievers and readers and create a front end user interface (UI) for clients. PrimeQA supports basic QA functionality such as information retrieval, reading comprehension, and auxiliary capabilities such as question generation, which are described in detail below:
1. Recover of information: PrimeQA includes extensions for dense (like ColBERT) and sparse (like BM25) retrievers. The repository consists of a single python script to switch to different retrieval algorithms by passing additional arguments.
2. Reading comprehension: The reader component predicts a response for a given query and a retrieved paragraph that are derived directly from or generated from the context. PrimeQA enables training and inference of extractive and generative readers through a single Python script.
3. Generation of questions: Question generation is a powerful method for improving the generalizability of quality control models. Modern stream-to-stream generation architectures are the foundation of PrimeQA’s QG component, which accepts structured and unstructured input text via a single Python script.
In short, PrimeQA is an open source library created by QA researchers and developers to make it easy to reproduce and reuse past and present work. With contributions from leading academic institutions, PrimeQA already has a strong developer community and welcomes participation from newcomers and professionals alike. PrimeQA’s reusability and ease of access have garnered much attention, allowing the library to naturally become a key tool in the rapid advancement of community QA technology.
review the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 15k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Khushboo Gupta is a consulting intern at MarktechPost. He is currently pursuing his B.Tech at the Indian Institute of Technology (IIT), Goa. She is passionate about the fields of machine learning, natural language processing, and web development. She likes to learn more about the technical field by participating in various challenges.