- Explaining LLMs is very slow and resource intensive.
- This article proposes a task-specific explanation technique or RAG Questions and Answers and Summary.
- The focus is model agnostic forks based on similarity.
- The focus is low-income and low latencyso it can work almost everywhere.
- I provided the code in GitHubusing the Hugging Face Transformers ecosystem.
There are many good reasons to get explanations for your model results. For example, they could help you. find problems with your model, or they could simply be a way to provide more transparency to the user, thus facilitating their confidence. That's why, for models like XGBoost, I have regularly applied methods like SHAPE for more information about my model's behavior.
Now that I deal more and more with LLM-based machine learning systems, I wanted explore ways to explain LLM models in the same way I did with more traditional ML approaches. However, I quickly found myself stuck because:
- SHAPE offers examples of text-based modelsbut in my case they failed with newer models as SHAP did not support embedding layers.
- caught It also offers a tutorial for <a target="_blank" class="af qa" href="https://captum.ai/tutorials/Llama2_LLM_Attribution” rel=”noopener ugc nofollow” target=”_blank”>attribution of master's degree in Law; However, both methods presented also had their very specific drawbacks. Specifically, the perturbation-based method was simply too slow, while the gradient-based method caused my GPU memory to explode and eventually crash.
After playing with quantization and even spinning up GPU cloud instances with still limited success, I'd had enough and stepped back.
To understand the approach, let's first briefly define what we want to achieve. Specifically we want Identify and highlight sections in our input text (for example, a long text document or a RAG context) that are highly relevant to the output of our model (for example, a summary or a RAG response).
In case of summaryour method would have to highlight parts of the original input text that are largely reflected in the summary. In case of a rag systemour approach would have to highlight document fragments from the RAG context that appear in the response.
Since directly explaining the LLM itself has proven intractable to me, I propose instead model the relationship between model inputs and outputs through a separate text similarity model. Specifically, I implemented the following simple but effective approach:
- Yo split model inputs and outputs in sentences.
- Yo calculate pairwise similarities among all the sentences.
- me then normalize similarity scores using softmax
- After that, I visualize the similarities between input and output sentences in a good plot
In code, this is implemented as shown below. To run the code you need the Hugging Face Transformers, Sentence transformersand NLTK libraries.
Please also look at this. Github repository to see the full code that accompanies this blog post.
from sentence_transformers import SentenceTransformer
from nltk.tokenize import sent_tokenize
import numpy as np# Original text truncated for brevity ...
text = """This section briefly summarizes the state of the art in the area of semantic segmentation and semantic instance segmentation. As the majority of state-of-the-art techniques in this area are deep learning approaches we will focus on this area. Early deep learning-based approaches that aim at assigning semantic classes to the pixels of an image are based on patch classification. Here the image is decomposed into superpixels in a preprocessing step e.g. by applying the SLIC algorithm (1).
Other approaches are based on so-called Fully Convolutional Neural Networks (FCNs). Here not an image patch but the whole image are taken as input and the output is a two-dimensional feature map that assigns class probabilities to each pixel. Conceptually FCNs are similar to CNNs used for classification but the fully connected layers are usually replaced by transposed convolutions which have learnable parameters and can learn to upsample the extracted features to the final pixel-wise classification result. ..."""
# Define a concise summary that captures the key points
summary = "Semantic segmentation has evolved from early patch-based classification approaches using superpixels to more advanced Fully Convolutional Networks (FCNs) that process entire images and output pixel-wise classifications."
# Load the embedding model
model = SentenceTransformer('BAAI/bge-small-en')
# Split texts into sentences
input_sentences = sent_tokenize(text)
summary_sentences = sent_tokenize(summary)
# Calculate embeddings for all sentences
input_embeddings = model.encode(input_sentences)
summary_embeddings = model.encode(summary_sentences)
# Calculate similarity matrix using cosine similarity
similarity_matrix = np.zeros((len(summary_sentences), len(input_sentences)))
for i, sum_emb in enumerate(summary_embeddings):
for j, inp_emb in enumerate(input_embeddings):
similarity = np.dot(sum_emb, inp_emb) / (np.linalg.norm(sum_emb) * np.linalg.norm(inp_emb))
similarity_matrix(i, j) = similarity
# Calculate final attribution scores (mean aggregation)
final_scores = np.mean(similarity_matrix, axis=0)
# Create and print attribution dictionary
attributions = {
sentence: float(score)
for sentence, score in zip(input_sentences, final_scores)
}
print("\nInput sentences and their attribution scores:")
for sentence, score in attributions.items():
print(f"\nScore {score:.3f}: {sentence}")
As you can see, this is pretty simple so far. Obviously, we do not explain the model itself. However, it is possible that we can get a good idea of relations between input and output sentences for this specific type of tasks (summary / RAG Q&A). But how does this actually work and how to visualize attribution results so that the result makes sense?
To visualize the results of this approach, I created two visualizations which are suitable for showing feature attributions or connections between input and output of the LLM, respectively.
These visualizations were generated for a summary from the LLM entry which says the following:
This section reviews the state of the art in semantic segmentation and instance segmentation, focusing on deep learning approaches. Early patch classification methods use superpixels, while more recent fully convolutional networks (FCNs) predict class probabilities for each pixel. FCNs are similar to CNNs but use transposed convolutions for upsampling. Standard architectures include U-Net and VGG-based FCNs, which are optimized for computational efficiency and feature size. For example, segmentation, proposal-based, and instance-embedding-based techniques are reviewed, including the use of propositions such as segmentation and the concept of instance embeddings.
Viewing role attributions
To view the feature attributionsmy choice was to simply stick to the original representation of the input data as closely as possible.
Specifically, I simply plot the sentences, including their calculated attribution scores. Therefore, I assign attribution scores to colors of the respective sentences.
In this case, this shows us some dominant patterns in the summary and the source sentences that the information could come from. Specifically, the predominance of FCN mentions as an architectural variant mentioned in the text, as well as the mention of instance segmentation methods based on the incorporation of proposals and instances, are clearly highlighted.
Overall, this method turned out to work quite well for easily capturing attributions in the input of a summary task, as it is very close to the original representation and adds very little mess to the data. I can also imagine providing such a visualization to the user of a RAG system on demand. Potentially, the results could also be further processed until certain particularly relevant thresholds are reached; so this could also be shown to the user by default for highlight relevant sources.
Again, check the Github repository to get the display code
Visualizing the flow of information
Another visualization technique does not focus on feature attributions, but primarily on the flow of information between the input text and the summary.
Specifically, what I do here is first determine the Main connections between input and output. sentences based on attribution scores. I then visualize those connections using a Sankey diagram. Here, the width of the flow connections is the connection strengthand the coloring is done based on the summary sentences for better traceability.
Here, it shows that the summary mostly follows the order of the text. However, there are few places where the LLM could have combined information From the beginning and end of the text, for example, the summary mentions a focus on deep learning approaches in the first sentence. This is taken from the last sentence of the input text and is clearly shown in the flowchart.
Overall, I found this useful, especially to get an idea of the extent to which the LLM is adding information. from different parts of the entry, rather than simply copying or rephrasing certain parts. In my opinion, this can also be useful to estimate how much error potential There is if an output depends too much on the LLM to establish connections between different bits of information.
In it code provided on Github I implemented certain extensions to the basic approach shown in the previous sections. Specifically I explored the following:
- use of different aggregations, as max, for the similarity score.
This may make sense since the average similarity with the output sentences is not necessarily relevant. A good hit could already be relevant to our explanation. - use of different window sizesfor example, taking fragments of three sentences to calculate similarities.
Again, this makes sense if you suspect that one sentence alone is not enough content to truly capture the relationship of two sentences, so a larger context is created. - use of cross-coding based models, like rerankers.
This could be useful as reorderers more re-explicitly model the relationship of two input documents in a model, being much more sensitive to nuanced language in the two documents. See also my recent post on Towards data science.
As said, this is all demonstrated in the Code provided, so be sure to check that out as well.
In general, I found it quite difficult to find tutorials that actually demonstrated explainability techniques for non-toy scenarios in RAG and summary. Especially techniques that are useful in “real-time” scenarios and therefore provide low latency seemed scarce. However, as shown in this post, simple solutions can already give pretty good results when it comes to showing relationships between documents and responses in a RAG use case. I'll definitely explore this further and see how I can probably use it in RAG production scenarios, such as providing trackable results to users has been invaluable to me. If you are interested in the topic and want more content in this style, follow me here on Half and in LinkedIn.