Large language models (LLMs) are remarkable at compressing knowledge about the world into billions of parameters.
However, LLMs have two main limitations: They only have knowledge updated up to the time of the last training iteration. And sometimes they tend to invent knowledge (hallucinate) when asked specific questions.
Using the RAG technique, we can provide previously trained LLMs with access to very specific information as additional context when answering our questions.
In this article, I will discuss the theory and practice of implementing Google's LLM Gemma with additional RAG capabilities using the Hugging Face transformer library, LangChain, and the Faiss vector database.
The following figure shows an overview of the RAG process, which we will implement step by step.