Hallucinations (the lies that generative ai models basically tell) are a big problem for companies looking to integrate the technology into their operations.
Because the models have no real intelligence and simply predict words, images, speech, music and other data according to a private scheme, they sometimes get it wrong. Very badly. In a recent article in The Wall Street Journal, a tech/ai/early-adopters-of-microsofts-ai-bot-wonder-if-its-worth-the-money-2e74e3a2″ target=”_blank” rel=”noopener”>fountain recounts a case where Microsoft's generative ai invented meeting attendees and implied that conference calls were about topics that weren't actually discussed on the call.
As I wrote a while ago, hallucinations can be an unsolvable problem with current transformer-based model architectures. But several generative ai vendors suggest that can be eliminated, more or less, through a technical approach called recovery augmented generation, or RAG.
This is how a salesman, Squirro, throws it:
At the heart of the offering is the concept of Retrieval Augmented LLM or Retrieval Augmented Generation (RAG) integrated into the solution… (our generative ai) is unique in its promise of zero hallucinations. Every piece of information you generate can be traced back to a source, ensuring credibility.
Here is a stocks/article/newsdirect-2024-4-3-ai-platform-sifthub-raises-55m-as-it-rescues-sales-and-presales-teams-from-repetitive-tasks” target=”_blank” rel=”noopener”>similar tone from SiftHub:
Using RAG technology and large language models fine-tuned with industry-specific knowledge training, SiftHub enables businesses to generate custom responses without hallucinations. This ensures greater transparency and reduced risk and inspires complete confidence to use ai for all your needs.
RAG was started by data scientist Patrick Lewis, a researcher at Meta and University College London, and lead author of the 2020 study. paper who coined the term. Applied to a model, RAG retrieves documents possibly relevant to a question (for example, a Wikipedia page about the Super Bowl) using what is essentially a keyword search and then asks the model to generate answers given this additional context.
“When you interact with a generative ai model like ChatGPT or Llama and ask a question, the default is for the model to respond from its 'parametric memory', that is, from the knowledge stored in its parameters as a result of training on massive data from the web,” explained David Wadden, a research scientist at AI2, the ai-focused research arm of the nonprofit Allen Institute. “But, just as you are likely to give more accurate answers if you have a reference (such as a book or file) in front of you, the same is true in some cases with models.”
RAG is undeniably useful: it allows you to attribute things generated by a model to recovered documents to verify their veracity (and, as a bonus, avoid potentially copyright-infringing regurgitation). RAG also allows companies that do not want their documents used to train a model (for example, companies in highly regulated industries such as healthcare and law) to allow models to rely on those documents in a more secure and temporary.
But RAG certainly can not prevent a model from having hallucinations. And it has limitations that many providers overlook.
Wadden says RAG is most effective in “knowledge-intensive” scenarios where a user wants to use a model to address an “information need” (for example, to find out who won the Super Bowl last year). In these scenarios, the document answering the question is likely to contain many of the same keywords as the question (e.g., “Super Bowl,” “last year”), making it relatively easy to find using a keyword search.
Things get more complicated with “intensive reasoning” tasks like coding and mathematics, where it is harder to specify in a keyword-based search query the concepts needed to answer a request, let alone identify which documents might be relevant.
Even with basic questions, models can get “distracted” by irrelevant content in documents, especially in long documents where the answer is not obvious. Or they may, for reasons still unknown, simply ignore the contents of the recovered documents and choose to rely on their parametric memory.
RAG is also expensive in terms of the hardware needed to apply it at scale.
This is because documents retrieved, whether from the web, an internal database, or elsewhere, must be stored in memory (at least temporarily) so that the model can query them. Another expense is calculating the augmented context that a model has to process before generating its response. For a technology that is already famous for the amount of computing and electricity it requires for even basic operations, this amounts to a serious consideration.
That's not to say RAG can't be improved. Wadden highlighted many ongoing efforts to train models to make better use of documents recovered from the RAG.
Some of these efforts involve models that can “decide” when to make use of documents, or models that can choose not to perform retrieval in the first place if they deem it unnecessary. Others focus on ways to more efficiently index massive document data sets and improve search through better representations of documents, representations that go beyond keywords.
“We're pretty good at retrieving documents based on keywords, but not so good at retrieving documents based on more abstract concepts, such as a proof technique needed to solve a mathematical problem,” Wadden said. “Research is needed to build document representations and search techniques that can identify relevant documents for more abstract generation tasks. “I think this is an open question at this point.”
So RAG can help reduce a model's hallucinations, but it's not the answer to all of ai's hallucinatory problems. Be wary of any provider who tries to claim otherwise.