Why RAG won't solve generative AI's hallucination problem | TechCrunch

Why RAG won't solve generative AI's hallucination problem | TechCrunch

Hallucinations — what lie-generating AI models tell, essentially — are a big problem for businesses looking to integrate the technology into their operations.

Because models have no real intelligence and are Simply predicting words, images, speech, music and other data according to a private schema, they sometimes get it wrong. Very wrong. In a recent piece in the Wall Street Journal, a Source describes an example where Microsoft's Generative AI invented meeting participants and showed that the conference calls were about subjects that were not discussed on the call.

As I wrote some time ago Deception This can be an intractable problem with today's transformer-based model architectures. But a number of productive AI vendors recommend that they can do More or less, be eliminated through a technical approach called recovery-augmented generation, or RAG.

A vendor here, Squireo, Makes it stand up:

The core of the offering is the concept of Retrieval Augmented LLMs or Retrieval Augmented Generation (RAG) embedded in the solution … (our generative AI) is unique in its promise of zero deception. Every piece of information it creates can be traced back to a source, ensuring its reliability.

Here is one Similar pitch From SiftHub:

Using RAG technology and large language models with industry-specific cognitive training, SiftHub allows companies to generate personalized responses with zero deception. This guarantees increased transparency and reduced risk and inspires complete confidence in using AI for all their needs.

RAG was started by data scientist Patrick Lewis, a researcher at Meta and University College London, and lead author of 2020. Paper Who coined the term. Applied to a model, RAG retrieves documents potentially relevant to a query – for example, a Wikipedia page about the Super Bowl – using what is essentially a keyword search. and then asks the model to generate responses given this additional context.

“When you're interacting with a creative AI model like Chat GPT or Llama And you ask a question, the model defaults to answering from its 'parametric memory'—that is, from the knowledge stored in its parameters as a result of training on massive data from the web,” David Widden, a research scientist at AI2, the AI-focused research division of the nonprofit Allen Institute, explained. “But, just as you're likely to give more accurate answers if you have a reference (such as a book or file) in front of you, the same is true for some models.”

RAG is undoubtedly useful – it allows one to attribute the objects that make up the documents produced by the model to verify their authenticity (and, as an added benefit, possibly copy Avoid copyright infringement Regurgitation). RAG also allows businesses that don't want their documents to be used to train a model—say, companies in highly regulated industries like healthcare and law—to send the models to those documents in a more secure and Allows temporary stretching.

But RAG for sure Can not do it Stop a model from cheating. And it has limitations that many vendors gloss over.

Wadden says RAG is most effective in “information-rich” scenarios where the user wants to use the model to satisfy an “information need” — for example, to find out what super was last year. Who won the bowl? In these scenarios, the document answering the query will likely contain many of the keywords in the query (eg, “Super Bowl,” “last year”), making it relatively easy to find through keyword searches.

Things get trickier with “reasoning-rich” tasks like coding and math, where it's hard to explain the concepts needed to answer a request in a keyword-based search query—much less Identify which documents may be relevant.

Even with basic queries, models can be “disturbed” by irrelevant content in documents, especially in long documents where the answer is not obvious. Or they may – for as yet unknown reasons – simply ignore the content of retrieved documents, instead relying on their parametric memory.

RAG is also expensive in terms of the hardware required to implement it at scale.

This is because retrieved documents, whether from the Web, an internal database, or elsewhere, have to be stored in memory—at least temporarily—so that the model can refer to them. Another expense is accounting for the increasing context that a model has to process before generating its response. For a technology that's already notorious for the amount of compute and power it requires even for basic operations, that's a serious consideration.

This does not mean that RAG cannot be improved. Wadden noted many ongoing efforts to train models to make better use of the documents obtained from RAG.

Some of these efforts include models that can “decide” when to use documents, or models that can choose not to retrieve documents in the first place if they deem them unnecessary. Others focus on ways to index large data sets of documents more efficiently and to improve search through better representation of documents.

“We're very good at retrieving documents based on keywords, but not so good at retrieving documents based on more abstract concepts, such as solving a math problem that requires a proof technique,” Waden said. said “Research is needed to develop document representation and search techniques that can identify relevant documents for more abstract creation tasks. I think it's mostly an open question at this point.

So RAG can help reduce model illusions – but it's not the answer to all AI illusion problems. Be wary of any vendor who tries to claim otherwise.

About the Author

Leave a Reply