Advanced Augmented Recovery Generation: From Theory to LlamaIndex Implementation | by Leonie Monigatti | February 2024

For additional ideas on how to improve the performance of your RAG pipeline so that it is production ready, continue reading here:

This section discusses the required packages and API keys to follow in this article.

Required packages

This article will guide you through implementing a naive and advanced RAG pipeline using ai/” rel=”noopener ugc nofollow” target=”_blank”>CallIndex in python.

pip install llama-index

In this article, we will use ai/llamaindex-v0-10-838e735948f8″ rel=”noopener ugc nofollow” target=”_blank”>CallIndex ai/llamaindex-v0-10-838e735948f8" rel="noopener ugc nofollow" target="_blank">v0.10. If you are upgrading from a previous version of LlamaIndex, you must run the following commands to install and run LlamaIndex correctly:

pip uninstall llama-index
pip install llama-index --upgrade --no-cache-dir --force-reinstall

LlamaIndex offers an option to store vector embeddings locally in JSON files for persistent storage, which is great for quickly prototyping an idea. However, we will use a vector database for persistent storage, since advanced RAG techniques target production-ready applications.

Since we will need metadata storage and hybrid search capabilities in addition to storing the vector embeddings, we will use the open source Vector Database. Weaviate (v3.26.2), which supports these features.

pip install weaviate-client llama-index-vector-stores-weaviate

API Keys

We'll use built-in Weaviate, which you can use for free without signing up for an API key. However, this tutorial uses an embedding model and LLM from OpenAI, for which you will need an OpenAI API key. To get one, you need an OpenAI account and then “Create new secret key” in API Keys.

Next, create a local .env file in your root directory and define your API keys in it:

OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"

You can then upload your API keys with the following code:

# !pip install python-dotenv
import os
from dotenv import load_dotenv,find_dotenvload_dotenv(find_dotenv())

This section discusses how to implement a naive RAG pipeline using LlamaIndex. You can find the entire naive RAG process in this Jupyter Notebook. For implementation using LangChain, you can continue at this article (naive RAG pipeline using LangChain).

Step 1: Define the integration model and LLM

First, you can define an embedding model and LLM in a global configuration object. Doing this means that you do not need to specify the models explicitly again in the code.

Embedding Model: Used to generate vector embeddings for the document and query fragments.
LLM: Used to generate a response based on the user's query and relevant context.

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.settings import SettingsSettings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
Settings.embed_model = OpenAIEmbedding()

Step 2: Load data

Next, you will create a local directory called data in your root directory and download some sample data from the LlamaIndex GitHub Repository (MIT License).

!mkdir -p 'data'
!wget '<https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt>' -O 'data/paul_graham_essay.txt'

You can then load the data for further processing:

from llama_index.core import SimpleDirectoryReader# Load data
documents = SimpleDirectoryReader(
input_files=("./data/paul_graham_essay.txt")
).load_data()

Step 3: Fragment documents into nodes

Since the entire document is too large to fit in the LLM context window, you will need to divide it into smaller text fragments, which are called Nodes in LlamaIndex. You can analyze documents uploaded to nodes using the SimpleNodeParser with a defined chunk size of 1024.

from llama_index.core.node_parser import SimpleNodeParsernode_parser = SimpleNodeParser.from_defaults(chunk_size=1024)
# Extract nodes from documents
nodes = node_parser.get_nodes_from_documents(documents)

Step 4: Create index

Next, you will create the index that stores all the external knowledge in Weaviatean open source vector database.

First, you will need to connect to a Weaviate instance. In this case, we are using Integrated Weaviate, which allows you to experiment on Notebooks for free without an API key. For a production-ready solution, deploy Weaviate yourself, e.g. via docker or using a managed serviceis recommended.

import weaviate# Connect to your Weaviate instance
client = weaviate.Client(
embedded_options=weaviate.embedded.EmbeddedOptions(), 
)

Next, you will build a VectorStoreIndex of the Weaviate client to store your data and interact with it.

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.weaviate import WeaviateVectorStoreindex_name = "MyExternalContext"
# Construct vector store
vector_store = WeaviateVectorStore(
weaviate_client = client, 
index_name = index_name
)
# Set up the storage for the embeddings
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Setup the index
# build VectorStoreIndex that takes care of chunking documents
# and encoding chunks to embeddings for future retrieval
index = VectorStoreIndex(
nodes,
storage_context = storage_context,
)

Step 5: Configure the query engine

Finally, you will configure the index as a query engine.

# The QueryEngine class is equipped with the generator
# and facilitates the retrieval and generation steps
query_engine = index.as_query_engine()

Step 6: Run a Naive RAG Query on Your Data

Now, you can run a naive RAG query on your data, as shown below:

# Run your naive RAG query
response = query_engine.query(
"What happened at Interleaf?"
)

In this section, we'll cover some simple tweaks you can make to turn the above naive RAG process into an advanced one. This tutorial will cover the following selection of advanced RAG techniques:

Since we will only cover modifications here, you can find the Complete end-to-end advanced RAG pipeline in this Jupyter Notebook.

For him ai/en/stable/examples/node_postprocessor/MetadataReplacementDemo.html” rel=”noopener ugc nofollow” target=”_blank”>prayer window recovery technique, you need to make two adjustments: First, you need to adjust how you store and post-process your data. Instead of SimpleNodeParserwe will use the SentenceWindowNodeParser.

from llama_index.core.node_parser import SentenceWindowNodeParser# create the sentence window node parser w/ default settings
node_parser = SentenceWindowNodeParser.from_defaults(
window_size=3,
window_metadata_key="window",
original_text_metadata_key="original_text",
)

He SentenceWindowNodeParser does two things:

Separates the document into individual sentences, which will be embedded.
For each sentence, create a context window. If you specify a window_size = 3, the resulting window will have three sentences, starting at the previous sentence of the embedded sentence and spanning the subsequent sentence. The window will be stored as metadata.

During retrieval, the phrase that most closely matches the query is returned. After recovery, you must replace the sentence with the entire metadata window by defining a MetadataReplacementPostProcessor and use it in the list node_postprocessors.

from llama_index.core.postprocessor import MetadataReplacementPostProcessor# The target key defaults to `window` to match the node_parser's default
postproc = MetadataReplacementPostProcessor(
target_metadata_key="window"
)
...
query_engine = index.as_query_engine( 
node_postprocessors = (postproc),
)

Implementing a hybrid search in LlamaIndex is as easy as changing two parameters in the query_engine whether the underlying vector database supports hybrid search queries. He alpha The parameter specifies the weight between vector search and keyword-based search, where alpha=0 means search based on keywords and alpha=1 means pure vector search.

query_engine = index.as_query_engine(
...,
vector_store_query_mode="hybrid", 
alpha=0.5,
...
)

Adding a reranker to your advanced RAG channel only requires three simple steps:

First, define a reclassification model. Here we are using the BAAI/bge-reranker-baseto hug the face.
In the query engine, add the reranker model to the list of node_postprocessors.
Increase the similarity_top_k in the query engine to retrieve more context passages, which can be reduced to top_n after reclassifying.

# !pip install torch sentence-transformers
from llama_index.core.postprocessor import SentenceTransformerRerank# Define reranker model
rerank = SentenceTransformerRerank(
top_n = 2, 
model = "BAAI/bge-reranker-base"
)
...
# Add reranker to query engine
query_engine = index.as_query_engine(
similarity_top_k = 6,
...,
node_postprocessors = (rerank),
...,
)

Advanced Augmented Recovery Generation: From Theory to LlamaIndex Implementation | by Leonie Monigatti | February 2024

Technical Terrence Team

Southwest Airlines recovers its free flight offer

Leave a Reply Cancel reply

Recommended.

5 Python Tips for Data Efficiency and Speed

The FTSE 100 is now down in 2023. Should I worry?

LIVE – MicroStrategy World: Bitcoin for corporations, day 2

Coinbase Launches COIN50 Index Tracking Major Assets BTC, ETH, SOL, XRP and DOGE

US customers can pay with Ethereum via PayPal via MetaMask integration

Categories

Important Links

Advanced Augmented Recovery Generation: From Theory to LlamaIndex Implementation | by Leonie Monigatti | February 2024

Required packages

API Keys

Step 1: Define the integration model and LLM

Step 2: Load data

Step 3: Fragment documents into nodes

Step 4: Create index

Step 5: Configure the query engine

Step 6: Run a Naive RAG Query on Your Data

Related

Technical Terrence Team

Southwest Airlines recovers its free flight offer

Leave a Reply Cancel reply

Recommended.

5 Python Tips for Data Efficiency and Speed

The FTSE 100 is now down in 2023. Should I worry?

LIVE – MicroStrategy World: Bitcoin for corporations, day 2

Coinbase Launches COIN50 Index Tracking Major Assets BTC, ETH, SOL, XRP and DOGE

US customers can pay with Ethereum via PayPal via MetaMask integration

Categories

Important Links

Get daily news updates to your inbox!