Introduction
Imagine you’re in a bookstore looking for the perfect book. You want recommendations that not only pertain to your favorite genre, but are also varied enough to introduce you to new authors. Augmented retrieval generation systems work in a similar way by combining the advantages of finding relevant information and generating creative responses. To measure the performance of these systems, we use metrics such as hit rate, which checks how often the correct recommendations appear, and mean reciprocal rank (MRR), which looks at the order of those recommendations. Maximum marginal relevance (MMR) helps ensure that the suggestions are relevant and diverse. By using these metrics, we can ensure that the recommendations are not only accurate, but also varied and interesting.
General description
- Learn about hit rate, MMR, and their roles in evaluating Recovery Augmented Generation (RAG) systems.
- Learn how to use maximum marginal relevance to balance relevance and diversity in retrieved results.
- Master the calculation of hit rate and mean reciprocal rank (MRR) to evaluate the effectiveness of information retrieval.
- Develop skills to analyze and improve RAG systems using various performance metrics.
What is the hit rate?
Hit rate is one of the measures used to evaluate the performance of recommender systems. It measures how often the desired item appears in the top-N recommendations. In the RAG framework, the hit rate indicates how often the relevant data is correctly included in the output produced.
How to calculate the hit rate?
Calculating the hit rate involves dividing the total number of queries by the frequency with which the relevant item appears in the top recommendations. In mathematical terms, it is expressed as follows:
Let us understand it better with an example. We have three queries Q1, Q2, Q3. We also know the exact node that needs to be selected for those queries. The actual nodes for those queries are N1, N2, N3. Now, on sending those queries, we get nodes from our retriever. The nodes retrieved for those queries are as mentioned below:
We can see that our retriever has retrieved the correct node for Q1 and Q2. It did not perform well with Q3. Therefore, the hit rate is 1 for Q1 and Q2 and 0 for Q3. By using our formula, we can calculate the hit rate:
Now that we understand the hit rate metric to evaluate our model, we will look at the challenges we face when using hit rate as our evaluation metric.
Challenge with hit rate
The biggest challenge we face when using hit rate as an evaluation metric is that it does not take into account the position of the retrieved node. To understand this better, let us look at an example. Suppose we have two retrievers: retriever 1 and retriever 2. The following image shows the nodes retrieved by both retrievers.
In the image above we can see that both retrievers have retrieved the correct node for Q1 and Q2, but not for Q3. Therefore, both get the same percentage of hits.
But on further inspection, we can see that retriever 1 has retrieved the correct node of Q1 at position three and retriever 2 has retrieved the correct node of Q1 at position one. Therefore, retriever 2 should get a higher score than retriever 1, but the hit rate does not take into account the position of the retrieved nodes. Now this is where the new MRR (Mean Reciprocal Rank) metric comes into the picture.
Mean Reciprocal Range (MRR)
A statistical metric used to evaluate the effectiveness of an information retrieval system is the mean reciprocal rank (MRR). It is especially useful in situations where the system answers a query by returning an ordered list of items (such as documents or answers). MRR is used to evaluate the retrieval component of the system's performance in retrieving relevant documents that facilitate the development of accurate and relevant answers in the context of augmented retrieval development (RAG).
How to calculate MRR?
N: Number of queries, ranki is the rank position of the first relevant document for the i-th query.
Let's look at an example for MRR.
In the above image we can see that the MRR for Q1 is ⅓, since the correct retrieved node is at the third position. Therefore, the MRR is calculated as
We can see that although the hit rate is 66.66%, the MRR is still at 44.4% and retrievers that retrieve correct nodes at initial positions get more weight.
Maximum marginal relevance (MMR)
Maximum marginal relevance (MMR) reorders results to improve both their relevance and diversity. To ensure that the returned items are relevant and varied enough to address all facets of the query, MMR attempts to strike a balance between novelty and relevance.
How to calculate MMR?
Here, D is the set of all candidate documents, R is the set of already selected documents, q is the query, Sim1 is the similarity function between a document and the query, and Sim2 is the similarity function between two documents. di and dj are documents in D and R respectively.
The parameter λ (mmr_threshold) controls the balance between relevance (the first term) and diversity (the second term). When mmr_threshold is close to 1, the system prioritizes relevance; when it is close to 0, it prioritizes diversity.
Let's look at a simple example that illustrates MMR. We'll use the same hit rate example to demonstrate how MMR reranks recovered nodes.
To continue with MMR let's assume some variables like relevance score:
- Rel(N2,Q1)=0.7
- Rel(N3,Q1)=0.6
- Rel(N1,Q1)=0.9
- Rel(N3,Q2)=0.9
- Rel(N5,Q2)=0.3
- Rel(N1,Q2)=0.6
- Rel(N1,Q3)=0.8
- Rel(N2,Q3)=0.5
- Rel(N4,Q3)=0.4
Similarity Score:
- Sim(N2,N3)=0.2
- Sim(N2,N1)=0.5
- Sim(N3,N1)=0.3
- Sim(N3,N5)=0.4
- Sim(N5,N1)=0.6
- Sim(N1,N2)=0.3
- Sim(N1,N4)=0.4
- Sim(N2,N4)=0.5
For simplicity, let's set λ=0.5\lambda = 0.5λ=0.5 to give equal weight to relevance and diversity.
Maternal mortality rate calculation
Maximum marginal relevance (MMR) is calculated by reranking the retrieved documents to balance relevance and diversity, ensuring a relevant and varied result list.
For the first trimester:
- Initial recovered nodes: (N2,N3,N1)
- First selection based on highest relevance: N1 (Rel = 0.9)
- Next, we calculate MMR for the remaining nodes (N2 and N3):
- MMR(N2)=0.5×0.7−0.5×max(0.5,0.2)=0.1
- MMR(N3)=0.5×0.6−0.5×max(0.3,0.2)=0.15
- Next, select N3 as it has the highest MMR score.
- Only N2 left.
Final order for the first quarter: (N1,N3,N2)
For the second trimester:
- Initial recovered nodes: (N3,N5,N1)
- First selection based on highest relevance: N3 (Rel = 0.9)
- Next, we calculate MMR for the remaining nodes (N5 and N1):
- MMR(N5)=0.5×0.3−0.5×max(0.4,0.6)=−0.15
- MMR(N1)=0.5×0.6−0.5×max(0.3,0.6)=0
- Next, select N1 as it has the highest (non-negative) MMR score.
- Only N5 left.
Final order for the second quarter: (N3,N1,N5)
For the third trimester:
- Initial recovered nodes: (N1,N2,N4)
- First selection based on highest relevance: N1 (Rel = 0.8)
- Next, we calculate MMR for the remaining nodes (N2 and N4):
- MMR(N2)=0.5×0.5−0.5×max(0.3,0.5)=−0.1
- MMR(N4)=0.5×0.4−0.5×max(0.4,0.5)=−0.05
- Next, select N4 as it has the highest (least negative) MMR score.
- Only N2 left.
Final order for the third quarter: (N1,N4,N2)
Using MMR, we re-rank the nodes to ensure a balance between relevance and diversity. The final re-ranked nodes are:
- P1: (N1,N3,N2)
- P2: (N3,N1,N5)
- Question 3: (N1,N4,N2)
Conclusion
Metrics such as hit rate, mean reciprocal rank, and maximum marginal relevance (MMR) are essential to assess and improve the effectiveness of RAG systems. While MMR maintains a balance between relevance and diversity in the results retrieved, hit rate and MRR focus on the frequency of retrieval of relevant information. RAG systems can greatly increase the caliber and applicability of the answers they create, thereby increasing user satisfaction and trust, by optimizing these metrics.
Frequent questions
A. We determine it by dividing the total number of searches by the number of results or relevant items in the top-N. We determine it by dividing the total number of searches by the number of results or relevant items in the top-N.
A. A reranking technique called Maximum Marginal Relevance (MMR) strikes a balance between the relevance and diversity of the returned items. By taking into account the relevance of a document to the query and its similarity to previously selected items, it seeks to reduce redundancy.
A. In RAG systems, the hit rate (a measure of how often relevant information is retrieved) is essential to producing accurate and contextually relevant answers. A higher hit rate indicates greater success in retrieving relevant information.
A. MMR minimizes redundancy by ensuring that the collection of recovered documents is diverse and relevant, facilitating the provision of comprehensive responses that address all facets of the investigation.