So you are creating a RAG system or using an LLM to chat with documents. But users often ask: how can we trust the answers?
In addition, we often hear about hallucinations that undermine users' confidence.
If we build an app but don't show users where the responses come from, the app could become unusable in some cases.
In this article, I will share an approach to address this concern. By linking each response generated by the LLM to its source text in the document, we can build transparency and trust. This method not only provides clear evidence of answers, but also allows users to verify the results directly in the PDF.
Sometimes the generated response may not be perfectly accurate, but being able to locate the correct source text is already helpful to the user.
Let's take an example of this document From arxiv.org. We can imagine this use case:
The first step of this approach is to extract the text from the PDF in a structured format.