
Author's image | Mid-journey and Canva
Do you want local RAG with a minimum of problems? Do you have a bunch of documents that you want to treat as a knowledge base for augmenting a language model? Do you want to create a chatbot that knows what you want it to know?
Well, this is possibly the easiest way.
It may not be the most optimized system in terms of inference speed, vector accuracy or storage, but it is very simple. Adjustments can be made if desired, but even without them, what we do in this short tutorial should make your local RAG system fully operational. And since we will be using Llama 3, we can also expect great results.
What do we use today as tools? 3 llamas: Ollama for model management, Llama 3 as our language model and LlamaIndex as our RAG framework. Call, call, call.
Let us begin.
Step 1: Ollama, for model management
Ollama can be used to manage and interact with language models. Today we will use it both for model management and, since LlamaIndex can interact directly with models managed by Ollama, also indirectly for interaction. This will make our overall process even easier.
We can install Ollama by following the system-specific instructions on the application page. GitHub repository.
Once installed, we can run Ollama from the terminal and specify the model we want to use.
Step 2: Call 3, the language model
Once Ollama is installed and operational, we can download any of the models listed in its GitHub repository or create our own Ollama-compatible model from other existing language model implementations. Using the Ollama run command will download the specified model if it is not present on your system, so downloading Llama 3 8B can be done with the following line:
Just make sure you have the local storage available to accommodate the 4.7GB download.
Once the Ollama terminal app starts up with the Llama 3 model as the backend, you can go ahead and minimize it. We will use LlamaIndex from our own script to interact.
Step 3: LlamaIndex, the RAG framework
The last piece of this puzzle is ai/” rel=”noopener” target=”_blank”>CallIndex, our RAG framework. To use LlamaIndex, you will need to make sure it is installed on your system. As the LlamaIndex package and namespace have made recent changes, it is best to consult the official documentation to install LlamaIndex in your local environment.
Once up and running, and with Ollama running with the Llama3 model active, you can save the following to a file (adapted from ai/en/stable/getting_started/starter_example_local/” rel=”noopener” target=”_blank”>here):
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
# My local documents
documents = SimpleDirectoryReader("data").load_data()
# Embeddings model
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
# Language model
Settings.llm = Ollama(model="llama3", request_timeout=360.0)
# Create index
index = VectorStoreIndex.from_documents(documents)
# Perform RAG query
query_engine = index.as_query_engine()
response = query_engine.query("What are the 5 stages of RAG?")
print(response)
This script does the following:
- The documents are stored in the “data” folder.
- The embedding model used to create RAG document embeddings is a BGE variant of Hugging Face
- The language model is the aforementioned Llama 3, which is accessed through Ollama.
- The query asked about our data (“What are the 5 stages of RAG?”) is appropriate since I left several RAG-related documents in the data folder.
And the result of our query:
The five key stages within RAG are: Loading, Indexing, Storing, Querying, and Evaluation.
Note that we would probably want to optimize the script in several ways to facilitate faster searching and maintain some state (embeds, for example), but I'll leave it to the interested reader to explore.
Final thoughts
Well, we did it. We got Ollama to provide a LlamaIndex based RAG application using Llama 3 locally in 3 fairly easy steps. Much more can be done with this, including optimizing, extending, adding a UI, etc., but the fact is that we were able to create our reference model with just a few lines of code on a minimal set of support . applications and libraries.
I hope you enjoyed the process.
Matthew May (twitter.com/mattmayo13″ rel=”noopener”>@mattmayo13) has a master's degree in computer science and a postgraduate diploma in data mining. As Editor-in-Chief, Matthew aims to make complex data science concepts accessible. His professional interests include natural language processing, machine learning algorithms, and exploring emerging ai. He is driven by the mission to democratize knowledge in the data science community. Matthew has been coding since he was 6 years old.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>