LFQA aims to provide a complete and exhaustive response to any query. Parametric information in large language models (LLMs) and retrieved documents presented at inference time allow LFQA systems to construct complicated answers to questions in paragraphs rather than extracting chunks in the evidence document. Recent years have revealed the surprising impressiveness and fragility of large-scale LLM LFQA capabilities. Retrieval has recently been proposed as a powerful approach to provide LMs with adequate and up-to-date information. However, how increased recovery influences LM during production is still unknown and does not always have the expected effects.
Researchers at the University of Texas at Austin investigate how retrieval influences answer creation for LFQA, a challenging long text generation problem. Their study provides two simulated research contexts, one in which the LM is kept constant while the evidence documents are changed and another in which the opposite is true. Because of the difficulty of assessing the quality of the LFQA, they begin by counting surface indicators (e.g., length, perplexity) associated with different response attributes such as coherence. The ability to attribute the generated response to available test documents is an attractive feature of augmented recall LFQA systems. Newly acquired human annotations on sentence-level attribution are used to test commercially available attribution detection technologies.
Based on their examination of surface patterns, the team concluded that improving recovery significantly modifies LM creation. Not all impacts are silenced when the articles presented are irrelevant; for example, the length of the generated responses may change. Unlike irrelevant documents, those that provide important evidence in context cause LMs to produce more unexpected sentences. Even when an identical set of evidence documents is used, various base LMs can have contrasting impacts due to increased recall. Their newly annotated data set provides a gold standard for measuring attribution evaluations. The findings show that NLI models that identified attribution in factoid QC also perform well in the LFQA context, outperforming chance by a wide margin but falling short of human agreement by a 15% margin in accuracy.
Research shows that even when given an identical set of documents, the quality of attribution can differ widely between base LMs. The study also shed light on attribution patterns in the production of long texts. The generated text tends to follow the sequence of the evidence documents in context, even when the document in context is a concatenation of numerous articles and the last sentence is much less traceable than the previous sentences. Overall, the study shed light on how LMs leverage contextual evidence documents to answer in-depth questions and point to viable research agenda items.
Review the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join. our 31k+ ML SubReddit, Facebook community of more than 40,000 people, Discord channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you’ll love our newsletter.
We are also on WhatsApp. Join our ai channel on Whatsapp.
Dhanshree Shenwai is a Computer Science Engineer and has good experience in FinTech companies covering Finance, Cards & Payments and Banking with a keen interest in ai applications. He is excited to explore new technologies and advancements in today’s evolving world that makes life easier for everyone.
<!– ai CONTENT END 2 –>