Creation of a multi-vector chatbot with LangChain, Milvus and Cohere

In the rapidly growing area of digital healthcare, medical chatbots are becoming an important tool
to improve patient care and provide fast and reliable information. This article explains how to create a medical chatbot that uses multiple vector stores. It focuses on creating a chatbot that can understand medical reports uploaded by users and give responses based on the information in these reports.

Additionally, this chatbot uses another vector store filled with conversations between doctors and patients on different medical topics. This approach allows the chatbot to have a wide range of medical knowledge and patient interaction examples, helping it provide personalized and relevant answers to users' questions. The goal of this article is to offer developers and healthcare professionals clear guidance on how to develop a medical chatbot that can be a useful resource for patients seeking information and advice based on their own reports and health concerns.

Learning objectives

Learn how to use open source medical datasets to train a chatbot on doctor-patient conversations.
Understand how to create and deploy a vector warehouse service for efficient data retrieval.
Gain skills in integrating large language models (LLMs) and embeddings to improve chatbot performance.
Learn how to create a multi-vector chatbot using LangChain, Milvus and Cohere to improve ai conversations.
Understand how to integrate vector stores and retrieval mechanisms for efficient, context-aware chatbot responses.

This article was published as part of the Data Science Blogathon.

Creation of a multi-vector chatbot with LangChain, Milvus and Cohere

Building a medical chatbot capable of understanding and responding to queries based on medical reports and conversations requires a carefully designed process. This channel integrates various services and data sources to process user queries and provide accurate and contextual responses. Below we outline the steps required to build this sophisticated chatbot channel.

Note: Services like logger, vector store, LLM and embeds were imported from other modules. You can access them from this repository. Make sure you add all API keys and vector store URLs before running the notebook.

Step 1: Import necessary libraries and modules

We will start by importing the necessary Python libraries and modules. The dotenv library loads environment variables, which are essential for managing sensitive information securely. The src.services module contains custom classes to interact with various services such as vector stores, embeddings, and language models. The Ingestion class of src.ingest handles the ingestion of documents into the system. We imported several LangChain and langchain_core components to facilitate information retrieval and response generation based on the chatbot's memory and conversation history.

import pandas as pd

from dotenv import load_dotenv
from src.services import LLMFactory, VectorStoreFactory, EmbeddingsFactory
from src.ingest import Ingestion
from langchain_core.prompts import (
    ChatPromptTemplate,
)
from langchain.retrievers.ensemble import EnsembleRetriever
from langchain.chains.history_aware_retriever import create_history_aware_retriever
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.memory import ConversationBufferWindowMemory, SQLChatMessageHistory

_ = load_dotenv()

Step 2: Load data

Then we will load the conversation dataset from the data directory. The data set can be downloaded from this url. This data set is essential to provide the LLM with a knowledge base to draw on when answering user queries.

data = pd.read_parquet("data/medqa.parquet", engine="pyarrow")
data.head()

When viewing the data, we can see that it has three columns: input, output, and instructions. We will consider only the input and output columns, since they are the patient's query and the doctor's response, respectively.

Step 3: data ingestion

An instance of the Ingestion class is created with specific services for vector embedding and storage. This setup is crucial to process and store medical data in a way that is accessible and useful to the chatbot. We will ingest the conversation data set first, as this takes time. The ingest pipeline was tuned to run ingest in batches every minute for large content to overcome the rate limiting error of the integrated services. You can choose to change the logic in the src directory to ingest all content, if you have any paid services to overcome the rate limit error. For this example we will use a patient report available online. You can download the report from here.

ingestion = Ingestion(
    embeddings_service="cohere",
    vectorstore_service="milvus",
)
ingestion.ingest_document(
    file_path="data/medqa.parquet",
    category="medical",
    sub_category="conversation",
    exclude_columns=("instruction"),
)
ingestion.ingest_document(
    file_path="data/anxiety-patient.pdf",
    category="medical",
    sub_category="document",
)

Step 4: Service initialization

The EmbeddingsFactory, VectorStoreFactory, and LLMFactory classes are used to instantiate the embedding, vector store, and language model services, respectively. You can download these modules from the repository mentioned at the beginning of this section. It has a built-in logger for observability and has options to choose from embeddings, LLM, and vector storage services.

embeddings_instance = EmbeddingsFactory.get_embeddings(
  embeddings_service="cohere",
)
vectorstore_instance = VectorStoreFactory.get_vectorstore(
    vectorstore_service="milvus", embeddings=embeddings_instance
)
llm = LLMFactory.get_chat_model(llm_service="cohere")

Step 5: Create Retrievers

We created two retrievers using the vector store instance: one for conversations (doctor-patient interactions) and one for documents (medical reports). We configure these retrievers to search for information based on similarities, using filters to limit the search to relevant categories and subcategories. Then, we use these fetchers to create a joint fetcher.

conversation_retriever = vectorstore_instance.as_retriever(
    search_type="similarity",
    search_kwargs={
        "k": 6,
        "fetch_k": 12,
        "filter": {
            "category": "medical",
            "sub_category": "conversation",
        },
    },
)
document_retriever = vectorstore_instance.as_retriever(
    search_type="similarity",
    search_kwargs={
        "k": 6,
        "fetch_k": 12,
        "filter": {
            "category": "medical",
            "sub_category": "document",
        },
    },
)
ensambled_retriever = EnsembleRetriever(
    retrievers=(conversation_retriever, document_retriever),
    weights=(0.4, 0.6),
)

Step 6: Manage conversation history

We set up a SQL-based system to store chat history, which is crucial for maintaining context during a conversation. This setup allows the chatbot to reference previous interactions, ensuring consistent and contextually relevant responses.

history = SQLChatMessageHistory(
    session_id="ghdcfhdxgfx",
    connection_string="sqlite:///.cache/chat_history.db",
    table_name="message_store",
    session_id_field_name="session_id",
)
memory = ConversationBufferWindowMemory(chat_memory=history)

Step 7: Generate responses

ChatPromptTemplate is used to define the structure and instructions for the chatbot's responses. This template guides the chatbot on how to use the retrieved information to generate detailed and accurate responses to user queries.

prompt = ChatPromptTemplate.from_messages(
    (
        (
            "system",
            """
            {context}""",
        ),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
    )
)

Step 8: Creating a Historically Aware RAG Chain

Now that all the components are ready, we sew them together to create a RAG chain.

question_answer_chain = create_stuff_documents_chain(llm, prompt)
history_aware_retriever = create_history_aware_retriever(
  llm, ensambled_retriever, prompt
)
rag_chain = create_retrieval_chain(
  history_aware_retriever, question_answer_chain,
)

Now the channel is ready to receive queries from users. The chatbot processes these queries through a retrieval chain, which involves retrieving relevant information and generating a response based on the language model and prompt template provided. Let's try the pipeline with some queries.

response = rag_chain.invoke({
	"input": "Give me a list of major axiety issues of Ann.",
    }
)
print(response("answer"))

The model was able to answer the query from the PDF document.

Step 8: Creating a Historically Aware RAG Chain

We can verify it using the sources.

Next, let's use the history and conversation database we ingested and check if the LLM uses it to answer something that is not mentioned in the PDF.

response = rag_chain.invoke({
        "input": "Ann seems to have insomnia. What can she do to fix it?",
    }
)
print(response("answer"))

If we check the answer with the sources, we can see that LLM actually uses the conversations database to answer about the new query.

Conclusion

Building a medical chatbot, as described in this guide, represents a significant advance in the application of artificial intelligence and machine learning technologies in the healthcare space. By taking advantage of a
A sophisticated pipeline that integrates vector stores, embeddings, and large language models, we can create a chatbot capable of understanding and responding to complex medical queries with high accuracy and relevance. This chatbot not only improves access to medical information for patients and healthcare seekers, but also demonstrates the potential of ai to support and augment healthcare services. The channel's flexible and scalable architecture ensures it can evolve to meet future needs, incorporating new data sources, models and technologies as they become available.

In conclusion, the development of this medical chatbot project is a step forward on the path.
towards smarter, more accessible and supportive healthcare tools. It highlights the importance of integrating advanced technologies, managing data effectively, and maintaining the context of the conversation, laying the foundation for future innovations in the field.

Key takeaways

Discover the process of creating a multi-vector Chatbot with LangChain, Milvus and Cohere to maintain fluid conversations.
Explore integrating vectorstores to enable efficient and context-sensitive responses in a multi-vector Chatbot.
The success of a medical chatbot depends on accurate processing of medical data and training the model.
Customization and scalability are key to creating a useful and adaptable medical assistant.
Leveraging onboarding and LLMs improves the chatbot's ability to provide accurate and contextual responses.

Frequently asked questions

P1. What is a medical chatbot?

A. A medical chatbot provides medical advice, information and support to users through conversational interfaces using artificial intelligence technology.

P2. How does a medical chatbot work?

A. It uses large language models (LLM) and a structured database to process medical data and generate answers to user queries based on trained insights.

P3. What are vectors in chatbot development?

A. Vectorstores store vector representations of text data, allowing efficient retrieval of relevant information for chatbot responses.

Q4. How can medical chatbots be customized?

A. Personalization involves tailoring the chatbot's responses based on specific user data, such as medical history or preferences, to provide more accurate and relevant assistance.

Q5. Is data privacy important in medical chatbot development?

A. Yes, ensuring the privacy and security of user data is essential, since medical chatbots handle sensitive information that must comply with regulations such as HIPAA.

The media shown in this article is not the property of Analytics Vidhya and is used at the author's discretion.

Sub-deep mandal

Machine learning and deep learning practitioner with experience in computer engineering. My work interests include Machine Learning, Deep Learning, Computer Vision and NLP, with experience in generative ai and augmented generation recovery.

Creation of a multi-vector chatbot with LangChain, Milvus and Cohere

Technical Terrence Team

With a 47% drop in one year, this could be the comeback king of the FTSE 250 in 2025

Leave a Reply Cancel reply

Recommended.

James Howells loses battle to recover $770 million worth of Bitcoin from landfill

UAW, automakers to resume talks as strike starts to create parts shortage By Reuters

Mysterious Ethereum Whale Accumulates $411M of ETH in February Amid ETF Rumors

Six And Counting: The Global Bitcoin Billionaire Phenomenon Unveiled

Top Cryptocurrencies to Watch This Week

Categories

Important Links

Creation of a multi-vector chatbot with LangChain, Milvus and Cohere

Learning objectives

Creation of a multi-vector chatbot with LangChain, Milvus and Cohere

Step 1: Import necessary libraries and modules

Step 2: Load data

Step 3: data ingestion

Step 4: Service initialization

Step 5: Create Retrievers

Step 6: Manage conversation history

Step 7: Generate responses

Step 8: Creating a Historically Aware RAG Chain

Conclusion

Key takeaways

Frequently asked questions

Related

Technical Terrence Team

With a 47% drop in one year, this could be the comeback king of the FTSE 250 in 2025

Leave a Reply Cancel reply

Recommended.

James Howells loses battle to recover $770 million worth of Bitcoin from landfill

UAW, automakers to resume talks as strike starts to create parts shortage By Reuters

Mysterious Ethereum Whale Accumulates $411M of ETH in February Amid ETF Rumors

Six And Counting: The Global Bitcoin Billionaire Phenomenon Unveiled

Top Cryptocurrencies to Watch This Week

Categories

Important Links

Get daily news updates to your inbox!