The development of artificial intelligence (ai) models, especially in specialized contexts, depends on their ability to access and use prior information. For example, legal ai tools need to be well versed in a wide range of prior cases, while customer service chatbots require specific information about the companies they serve. The augmented retrieval and generation (RAG) methodology is a method that developers frequently use to improve the performance of an ai model in a number of areas.
By extracting relevant information from a knowledge base and integrating it into user input, RAG greatly improves ai performance. However, a major drawback of traditional RAG approaches is that they often lose context during the encoding process, making it difficult to extract the most relevant information.
RAG’s reliance on segmenting materials into smaller, more manageable chunks for retrieval can unintentionally result in the loss of important context. For example, a user might query a particular company’s sales growth during a given quarter using a financial knowledge base. A conventional RAG system might retrieve a section of text that says, “Company revenue grew 3% over the previous quarter.” But without any context, this excerpt does not indicate which company or quarter is being analyzed, making the information retrieved less useful.
To overcome this problem a new technique known as Contextual Recovery Anthropic ai introduced a technique that significantly increases the information retrieval accuracy of RAG systems. The two sub-techniques that support this approach are contextual embeddings and contextual BM25. Contextual retrieval can reduce the rate of failed information retrievals by 49%, and when combined with reclassification, by a staggering 67%, by improving the way text segments are processed and stored. These improvements directly transfer to increased efficiency in downstream tasks, increasing the effectiveness and reliability of ai models.
So that Contextual Recovery In order to work, each text segment must first have explanatory context specific to the aggregated snippet before it can be embedded or the BM25 index can be built. An excerpt that says, for example, “Company revenue grew 3% from the previous quarter,” could be changed to say, “This excerpt is from an SEC filing regarding ACME Corp’s performance in the second quarter of 2023; revenue for the previous quarter was $314 million. Company revenue increased 3% from the previous quarter.” The system finds it much easier to retrieve and apply the correct information with this additional context.
Developers can use ai assistants like Claude to achieve contextual retrieval across huge knowledge libraries. They can create short, context-specific annotations for each snippet by giving Claude precise instructions. These annotations are then added to the text before it is embedded and indexed.
Embedding models are used in conventional RAG to capture semantic associations between text segments. These models can miss important exact matches at times, particularly when handling queries including unique identifiers or technical phrases. This is where the lexical matching-based ranking feature BM25 comes in handy. Due to its exceptional word or phrase matching capabilities, BM25 is especially useful for technical queries that require correct information retrieval. RAG systems can better retrieve the most pertinent information by integrating contextual embedding with BM25, striking a balance between exact term matching and broader semantic understanding.
A simpler method might work for smaller knowledge bases, where the entire data set can fit into the ai model’s context window. However, larger knowledge bases require the use of more advanced techniques, such as contextual retrieval. This method allows working with substantially larger knowledge bases than could fit into a single query, not only because it successfully scales to larger data sets, but also because it significantly improves retrieval accuracy.
A reranking phase can be added to further improve contextual retrieval performance. Reranking is the process of filtering and prioritizing potentially relevant snippets that have been retrieved first based on their relevance and importance to the user’s query. By ensuring that the ai model receives only the most relevant data, this phase improves response times and reduces overhead. In testing, the retrieval failure rate for the top 20 snippets was reduced by 67% when contextual retrieval and reranking were combined.
In conclusion, contextual retrieval is a major improvement in the efficiency of ai models, especially in specific circumstances where precise and accurate information retrieval is needed. The combination of contextual BM25, contextual embeddings, and reranking can lead to significant improvements in retrieval accuracy and overall ai performance.
Tanya Malhotra is a final year student of the University of Petroleum and Energy Studies, Dehradun, pursuing BTech in Computer Engineering with specialization in artificial intelligence and Machine Learning.
She is a data science enthusiast with good analytical and critical thinking skills, along with a keen interest in acquiring new skills, leading groups, and managing work in an organized manner.