Skip to main content

About one year ago Contextual AI launched RAG 2.0 – an evolution of the Retrieval-Augmented Generation methodology – to enhance the reliability of AI-generated content: here is how it works and why it has not gained great following (so far)

Uncertain factual accuracy and difficulties in updating data: the limitations of Generative AI

In the world of GenAI, ensuring accurate and up-to-date answers has been a challenge from the very beginning.

The limitations in terms of factual accuracy and datasets updating are in fact two crucial issues in this field, since Large Language Models (LLMs) are very good at generating coherent, linguistically correct and contextually relevant text, but not capable of ensuring reliability.

This is why they often grapple with the so-called “hallucinations”– instances where the AI generates plausible-sounding but incorrect.

LLMs, in fact, are very complex probabilistic models that predict the next word on the basis of a probability calculation. This sophistication allows them to give answers of such quality that they all seem correct, but, as we know, this is not always true.

Avoid LLMs’ hallucinations and keep them up-to-date: here’s why RAG was created

The issue of hallucinations arises because LLMs rely heavily on static training data, limiting their ability to incorporate new information post-training. Basically, LLMs generate sentences based on the data used to train them.

The concept of Retrieval-Augmented Generation (RAG) was introduced to address these limitations, since RAG methodology enhances the performance of GenAI models by integrating them with external data sources.

This approach allows LLMs to retrieve and utilize up-to-date information from a designated knowledge base, grounding their answers in current and validated data.

By combining the strengths of information retrieval and natural language generation, RAG demonstrated its capability of reducing the propensity for hallucinations of LLMs and improving the overall reliability of AI-generated content.

Discovering Retrieval-Augmented Generation: how it works and its benefits

Going deeper in the characteristic of Retrieval-Augmented Generation, a key element of this methodology is its architecture, that introduces an additional step with respect to the standard Generative AI models, which just receive one sequence of words as input and return another as output.

With the RAG methodology, instead, the input is still passed directly to the text generator, but it is also used to retrieve a set of relevant documents from an additional source.

These two sources, acting together, complement each other, thus integrating all the information and also being capable of generating correct answers even in cases where these are not found textually in any of the documents.

Above all, LLMs using RAG provide unprecedented flexibility, as there is no need to retrain them to obtain up-to-date answers, but just to replace the documents used to retrieve the information.

And this is why RAG addresses a crucial need of GenAI models, namely to be able to access not only large amounts of information, but especially the right information.

From the original RAG to RAG 2.0: key enhancements introduced by Contextual AI

As shown, RAG represents a huge step forward in the race for GenAI reliability, but hallucinations still affect LLMs.

Due to this, many players in the field of Artificial Intelligence are working to find ways to overcome that issue, like RAG 2.0. In fact, building upon the foundational RAG framework, Contextual AI introduced RAG 2.0 in March 2024.

This advancement was driven by Douwe Kiela, co-founder and CEO of Contextual AI, who initially co-developed the original RAG methodology in 2020 during his tenure at Facebook AI Research.

The primary motivation behind RAG 2.0 was to refine and enhance the integration between retrieval mechanisms and generative models, addressing inefficiencies observed in earlier implementations.

Original RAG systems operated by loosely coupling pre-existing models, vector databases, and embeddings. While this modular approach facilitated initial integrations, it sometimes resulted in inefficiencies and a lack of cohesion among components.

RAG 2.0 addresses these challenges through a holistic design that emphasizes end-to-end optimization.

By pre-training, fine-tuning, and aligning all components – including both the retriever and the LLM – as a unified system, RAG 2.0 was created to ensure seamless interaction and improved performance, exploiting a cohesive integration that minimizes the disjointedness of previous models, leading to more reliable and contextually accurate outputs.

What’s the current status of RAG 2.0? Demonstrated impact and limitations

The implementation of RAG 2.0 has yielded notable advancements in AI performance metrics. Contextual AI’s development of Contextual Language Models (CLMs) using the RAG 2.0 framework has set new benchmarks across various industry standards.

Empirical evaluations indicated that these CLMs surpass strong RAG baselines, including those based on models like GPT-4 and other leading open-source alternatives.

This performance leap underscores RAG 2.0’s potential to deliver Generative AI solutions that are not only robust and reliable but also consistently up-to-date – obviously only if the internal knowledge base is constantly updated -, making it particularly advantageous for enterprise applications where accuracy and timeliness are crucial.

By seamlessly integrating retrieval processes with generative capabilities, RAG 2.0 effectively mitigates issues of outdated information and hallucinations, paving the way for AI applications that are both intelligent and trustworthy.

Despite this demonstrated impact, that should have made RAG 2.0 a pivotal progression in the quest to enhance the factual accuracy and dependability of GenAI systems, this methodology has not gained a huge following (so far).

Why? The reasons could be mainly two. The first is that Contextual AI shared only a few details, especially on how the LLM is contextualized along with the module that retrieves the documents relevant to a question.

The second reason could be that, taking into account what Contextual AI shared, a joint training between the LLM and the retriever is needed to contextualize the LLM, probably raising costs and thus pushing to accept the possibility of having a higher rate of hallucinations instead of accessing resources to train the model and investing the related budget.

How we ensure Generative AI reliability at Aptus.AI, starting from data to R&D operations

At Aptus.AI we have been dealing with these issues for some years now. In fact, exploiting our research and developments, we created a Generative AI-based Assistant capable of minimizing the effect of hallucinations in the field of legal and regulatory analysis.

As highlighted, Generative AI solutions do not represent a valid solution as they are, as confirmed by the University of Stanford, which in June 2024 published a paper analyzing the effectiveness of Generative AI solutions in the legal sector, finding an hallucinations rate which is unacceptable if compared to the expectations of this market and not aligned with the selling propositions used by some companies in the sector.

Aptus.AI’s AI Assistant overcomes these limitations exploiting our proprietary machine-readable format that allows us to increase the GenAI reliability and minimize the negative effect of hallucinations.

Currently it is very hard to apply Generative AI tools to the legal and regulatory fields, precisely because the regulatory sources data is inaccessible, but our proprietary technology allows the transformation of legal document into a machine-readable format on which we apply our Normative Analysis, so that GenAI can easily access the information needed to answer or to generate legal content.

This information includes all the context and data needed to answer a given question, generate a document or a content at all levels of the regulatory hierarchy, so that the accuracy of the AI-generated content is guaranteed. And that’s not all.

In fact, our R&D team is constantly at work to keep enhancing the performance of Aptus.AI’s AI Assistant, exploiting the state-of-the-art in the field of LLMs architecture and methodologies, just like the so-called CAGCache-Augmented Generation – which we will address in another blog post.