Tired of manually examining audio hours to find key ideas? This guide teaches you to build a chatbot with ai that transforms recordings (meetings, podcasts, interviews) into interactive conversations. Use of Assemyai for a precise transcription with speakers labels, QDRANT for the storage of fast data and Deepseek-R1 through Sambanova Cloud for smart answers, will create a rag tool that answers questions such as “What said (speaker)?” either “Summarize this segment.” We convert your audio into an ai search dialogue through the construction of a rag system with Assemyai, Qrant and Deepseek-R1.
Learning objectives
- Take advantage of the assembly API to transcribe audio files with speaker newspaper, converting conversations into structured text data for the analysis.
- Implement the QDRANT Vector Database to efficiently store and recover the incrustations of transcribed audio content using HuggingFace models.
- Implement the rag with the Deepseek R1 model through Sambanova Cloud to generate answers from context conscious.
- Create a streamlit web interface for users to load audio files, visualize transcripts and interact with chatbot in real time.
- Integrate end -to -end workflow that combines audio processing, vector storage and IA -based response generation to create a scalable audio -based chat application.
This article was published as part of the Blogathon of Data Sciences.
What is Assemyai?
Assemyai is your reference tool to turn audio into processable ideas. Whether you are transcribing podcasts, analyzing customer calls or subtitulating videos, your voice engine with text offers tip precision, even with accents or background noise.

What is Sambanova Cloud?
Imagine to execute massive open source models such as Deepseek-R1 (671b) up to 10 times faster, and without usual infrastructure headaches.

Instead of trusting the GPU, Sambanova usrdus (reconfigurable data flow units), which unlock a faster performance with:
- Mass storage in memory: Without constant recharge of models
- Efficient data flow design: Optimized for high performance tasks
- Instant models switching: Change between microsecond models
- Execute Deepseek-R1 instantly, complicated configuration is not required
- Train and adjust on the same platform, all in one place
What is QDRANT?
QDRANT is a Lightning-FAST vector database built to overcome ai applications, think about it as a search engine that finds needles in Paystacks. Whether you are creating a recommendation system, image search tool or chatbot, QDRONT specializes in similarity searches, quickly identifying the closest coincidences for complex data such as text integrations or visual characteristics.

What is Deepseek-R1?
Deepseek-R1 is a language model that changes the game that combines human adaptability with the avant-garde, which makes it an outstanding one in the processing of natural language. Whether you are creating content, translate languages, purify code or summarize complex reports, R1 stands out to understand the context, tone and intention, delivering answers that feel intuitive instead of robotics. By prioritizing empathy and precision, Depseek-R1 is not just a tool; It is a look at a future where ai communicates as naturally as us.

Building the rag model with Assemyai and Deepseek-R1
Now that it includes all the components, we immerse to build our rag. But before doing that, we quickly cover what you will need to start.
1. required previous requirements
Below are the required prerequisites:
Clone the repository:
git clone https://github.com/karthikponna/chat_with_audios.git
cd chat_with_audios
Create and activate the virtual environment:
# For macOS and Linux:
python3 -m venv venv
source venv/bin/activate
# For Windows:
python -m venv venv
.\venv\Scripts\activate
Install required units:
pip install -r requirements.txt
Configure environment variables:
Create an `.env` file and add your Assembly and <a target="_blank" href="https://cloud.sambanova.ai/apis” target=”_blank” rel=”nofollow noopener”>Sambanova API keys.
ASSEMBLYAI_API_KEY="your_assemblyai_api_key_string"
SAMBANOVA_API_KEY="your_sambanova_api_key_string"
Now let's start with the coding part.
2. Increased generation recovery
RAG fuses large language models with external data to produce more precise answers in context. He obtains relevant information at the time of consultation, ensuring that the answers depend on real data instead of only training in models.
2.1 Import necessary libraries
We believe a file called Rag_code.py. We will walk the step by step, starting with the importation of the necessary modules and orchestrating the architecture of the code using the <a target="_blank" href="https://www.llamaindex.ai/” target=”_blank” rel=”nofollow noopener”>Flame index.
from qdrant_client import models
from qdrant_client import QdrantClient
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.sambanovasystems import SambaNovaCloud
from llama_index.llms.ollama import Ollama
import assemblyai as aai
from typing import List, Dict
from llama_index.core.base.llms.types import (
ChatMessage,
MessageRole,
)
2.2 Batch processing and embedding with a hug face
Here the Batch_iterate function divides a text list into smaller pieces, which facilitates processing large data sets. The incredible class then loads an embedded face inlaid model, generates inlays for each text batch and collects these incrustations for subsequent use.
def batch_iterate(lst, batch_size):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), batch_size):
yield lst(i : i + batch_size)
class EmbedData:
def __init__(self, embed_model_name="BAAI/bge-large-en-v1.5", batch_size = 32):
self.embed_model_name = embed_model_name
self.embed_model = self._load_embed_model()
self.batch_size = batch_size
self.embeddings = ()
def _load_embed_model(self):
embed_model = HuggingFaceEmbedding(model_name=self.embed_model_name, trust_remote_code=True, cache_folder="./hf_cache")
return embed_model
def generate_embedding(self, context):
return self.embed_model.get_text_embedding_batch(context)
def embed(self, contexts):
self.contexts = contexts
for batch_context in batch_iterate(contexts, self.batch_size):
batch_embeddings = self.generate_embedding(batch_context)
self.embeddings.extend(batch_embeddings)
2.3 QDRANT VECTOR DATABASE Settings and ingestion
- The QDRANTVDB_QB class initializes an QDRANT Vector database configuring key parameters such as the name of the collection, the vector dimension and the lot size, and connects to QDRONT while verifying an existing collection (creating a if necessary).
- It is charged efficiently using the text contexts with their corresponding inlays and then updating the collection configuration accordingly.
class QdrantVDB_QB:
def __init__(self, collection_name, vector_dim = 768, batch_size=512):
self.collection_name = collection_name
self.batch_size = batch_size
self.vector_dim = vector_dim
def define_client(self):
self.client = QdrantClient(url="http://localhost:6333", prefer_grpc=True)
def create_collection(self):
if not self.client.collection_exists(collection_name=self.collection_name):
self.client.create_collection(collection_name=f"{self.collection_name}",
vectors_config=models.VectorParams(size=self.vector_dim,
distance=models.Distance.DOT,
on_disk=True),
optimizers_config=models.OptimizersConfigDiff(default_segment_number=5,
indexing_threshold=0),
quantization_config=models.BinaryQuantization(
binary=models.BinaryQuantizationConfig(always_ram=True)),
)
def ingest_data(self, embeddata):
for batch_context, batch_embeddings in zip(batch_iterate(embeddata.contexts, self.batch_size),
batch_iterate(embeddata.embeddings, self.batch_size)):
self.client.upload_collection(collection_name=self.collection_name,
vectors=batch_embeddings,
payload=({"context": context} for context in batch_context))
self.client.update_collection(collection_name=self.collection_name,
optimizer_config=models.OptimizersConfigDiff(indexing_threshold=20000)
)
2.4 Consult entrusting retriever
- The retriever class is designed to close the gap between user consultations and a vector database initializing with a vector database customer and an inlaid model.
- Your search method transforms a consultation into an inlaid using the model, then performs a vector search in the database with tune in tune parameters to quickly recover the relevant results.
class Retriever:
def __init__(self, vector_db, embeddata):
self.vector_db = vector_db
self.embeddata = embeddata
def search(self, query):
query_embedding = self.embeddata.embed_model.get_query_embedding(query)
result = self.vector_db.client.search(
collection_name=self.vector_db.collection_name,
query_vector=query_embedding,
search_params=models.SearchParams(
quantization=models.QuantizationSearchParams(
ignore=False,
rescore=True,
oversampling=2.0,
)
),
timeout=1000,
)
return result
2.5 Smart rag consultation assistant
The RAG class integrates a retriever and a LLM to generate answers with the context. Recover relevant information from a vector database, format it in a structured message and send it to the LLM for an answer. I am using Sambanovacloud to access the LLM model through its API for an efficient text generation.
class RAG:
def __init__(self,
retriever,
llm_name = "Meta-Llama-3.1-405B-Instruct"
):
system_msg = ChatMessage(
role=MessageRole.SYSTEM,
content="You are a helpful assistant that answers questions about the user's document.",
)
self.messages = (system_msg, )
self.llm_name = llm_name
self.llm = self._setup_llm()
self.retriever = retriever
self.qa_prompt_tmpl_str = ("Context information is below.\n"
"---------------------\n"
"{context}\n"
"---------------------\n"
"Given the context information above I want you to think step by step to answer the query in a crisp manner, incase case you don't know the answer say 'I don't know!'.\n"
"Query: {query}\n"
"Answer: "
)
def _setup_llm(self):
return SambaNovaCloud(
model=self.llm_name,
temperature=0.7,
context_window=100000,
)
# return Ollama(model=self.llm_name,
# temperature=0.7,
# context_window=100000,
# )
def generate_context(self, query):
result = self.retriever.search(query)
context = (dict(data) for data in result)
combined_prompt = ()
for entry in context(:2):
context = entry("payload")("context")
combined_prompt.append(context)
return "\n\n---\n\n".join(combined_prompt)
def query(self, query):
context = self.generate_context(query=query)
prompt = self.qa_prompt_tmpl_str.format(context=context, query=query)
user_msg = ChatMessage(role=MessageRole.USER, content=prompt)
# self.messages.append(ChatMessage(role=MessageRole.USER, content=prompt))
streaming_response = self.llm.stream_complete(user_msg.content)
return streaming_response
2.6 Audio transcription
Here transcribes the class is initialized by establishing the API Assemyai key and creating a transcriptor. Then process an audio file using a configuration that allows speakers labels, ultimately returning a list of dictionaries where each entry assigns a speaker to its transcribed text.
class Transcribe:
def __init__(self, api_key: str):
"""Initialize the Transcribe class with AssemblyAI API key."""
aai.settings.api_key = api_key
self.transcriber = aai.Transcriber()
def transcribe_audio(self, audio_path: str) -> List(Dict(str, str)):
"""
Transcribe an audio file and return speaker-labeled transcripts.
Args:
audio_path: Path to the audio file
Returns:
List of dictionaries containing speaker and text information
"""
# Configure transcription with speaker labels
config = aai.TranscriptionConfig(
speaker_labels=True,
speakers_expected=2 # Adjust this based on your needs
)
# Transcribe the audio
transcript = self.transcriber.transcribe(audio_path, config=config)
# Extract speaker utterances
speaker_transcripts = ()
for utterance in transcript.utterances:
speaker_transcripts.append({
"speaker": f"Speaker {utterance.speaker}",
"text": utterance.text
})
return speaker_transcripts
3. Transmission application
Streamlit is a Python library that transforms data scripts into interactive web applications, which makes it perfect for LLM -based solutions.
- The following code creates an easy -to -use application that allows users to load an audio file, see their transcription and chat accordingly.
- Assemyai transcribes the audio loaded into text marked with speaker.
- The transcription is integrated and stored in a QDRant vector database for efficient recovery.
- A retriever matched with a RAG engine generates chat responses with the context using these incrustations.
- Session State manages chat history and storage in file cache to guarantee an experience without problems.
import os
import gc
import uuid
import tempfile
import base64
from dotenv import load_dotenv
from rag_code import Transcribe, EmbedData, QdrantVDB_QB, Retriever, RAG
import streamlit as st
if "id" not in st.session_state:
st.session_state.id = uuid.uuid4()
st.session_state.file_cache = {}
session_id = st.session_state.id
collection_name = "chat with audios"
batch_size = 32
load_dotenv()
def reset_chat():
st.session_state.messages = ()
st.session_state.context = None
gc.collect()
with st.sidebar:
st.header("Add your audio file!")
uploaded_file = st.file_uploader("Choose your audio file", type=("mp3", "wav", "m4a"))
if uploaded_file:
try:
with tempfile.TemporaryDirectory() as temp_dir:
file_path = os.path.join(temp_dir, uploaded_file.name)
with open(file_path, "wb") as f:
f.write(uploaded_file.getvalue())
file_key = f"{session_id}-{uploaded_file.name}"
st.write("Transcribing with AssemblyAI and storing in vector database...")
if file_key not in st.session_state.get('file_cache', {}):
# Initialize transcriber
transcriber = Transcribe(api_key=os.getenv("ASSEMBLYAI_API_KEY"))
# Get speaker-labeled transcripts
transcripts = transcriber.transcribe_audio(file_path)
st.session_state.transcripts = transcripts
# Each speaker segment becomes a separate document for embedding
documents = (f"Speaker {t('speaker')}: {t('text')}" for t in transcripts)
# embed data
embeddata = EmbedData(embed_model_name="BAAI/bge-large-en-v1.5", batch_size=batch_size)
embeddata.embed(documents)
# set up vector database
qdrant_vdb = QdrantVDB_QB(collection_name=collection_name,
batch_size=batch_size,
vector_dim=1024)
qdrant_vdb.define_client()
qdrant_vdb.create_collection()
qdrant_vdb.ingest_data(embeddata=embeddata)
# set up retriever
retriever = Retriever(vector_db=qdrant_vdb, embeddata=embeddata)
# set up rag
query_engine = RAG(retriever=retriever, llm_name="DeepSeek-R1-Distill-Llama-70B")
st.session_state.file_cache(file_key) = query_engine
else:
query_engine = st.session_state.file_cache(file_key)
# Inform the user that the file is processed
st.success("Ready to Chat!")
# Display audio player
st.audio(uploaded_file)
# Display speaker-labeled transcript
st.subheader("Transcript")
with st.expander("Show full transcript", expanded=True):
for t in st.session_state.transcripts:
st.text(f"**{t('speaker')}**: {t('text')}")
except Exception as e:
st.error(f"An error occurred: {e}")
st.stop()
col1, col2 = st.columns((6, 1))
with col1:
st.markdown("""
# RAG over Audio powered by
and
""".format(base64.b64encode(open("assets/AssemblyAI.png", "rb").read()).decode(),
base64.b64encode(open("assets/deep-seek.png", "rb").read()).decode()), unsafe_allow_html=True)
with col2:
st.button("Clear ↺", on_click=reset_chat)
# Initialize chat history
if "messages" not in st.session_state:
reset_chat()
# Display chat messages from history on app rerun
for message in st.session_state.messages:
with st.chat_message(message("role")):
st.markdown(message("content"))
# Accept user input
if prompt := st.chat_input("Ask about the audio conversation..."):
# Add user message to chat history
st.session_state.messages.append({"role": "user", "content": prompt})
# Display user message in chat message container
with st.chat_message("user"):
st.markdown(prompt)
# Display assistant response in chat message container
with st.chat_message("assistant"):
message_placeholder = st.empty()
full_response = ""
# Get streaming response
streaming_response = query_engine.query(prompt)
for chunk in streaming_response:
try:
new_text = chunk.raw("choices")(0)("delta")("content")
full_response += new_text
message_placeholder.markdown(full_response + "▌")
except:
pass
message_placeholder.markdown(full_response)
# Add assistant response to chat history
st.session_state.messages.append({"role": "assistant", "content": full_response})
Execute the App.py file in the terminal, with the following code, where you can load an audio file and interact with the chatbot.
streamlit run app.py
You can see the demonstration using the application here. And you can download the sample audio file from here.
Conclusion
We have successfully combined Assemyai, Sambanova Cloud, Qrant and Deepseek to build a chatbot that uses the generation of augmented recovery on the audio. The RAG_CODE.PY file manages the RAG workflow, while the APP.Py file provides a simple interface. I want you to interact with this chatbot using different audio files, adjust the code, add new functions and explore the infinite possibilities of audio -based chat solutions.
GITHUB repo: https://github.com/karthikponna/chat_with_audios/tree/main
Key control
- Take advantage of the assembly for audio transcription allows a precise text marked with speaker, providing a solid base for advanced conversation experiences.
- Qrant integration guarantees rapid vector -based recovery, offering rapid access to the relevant context for more informed responses.
- The application of a RAG approach combines recovery and generation, guaranteeing answers based on real data.
- The use of Sambanova Cloud for LLM offers a robust language understanding, promoting attractive, contextual interactions.
- The use of Strewlit for the user interface offers a direct and interactive environment, which simplifies the implementation of audio -based chatbot.
The means shown in this article are not owned by Analytics Vidhya and are used at the author's discretion.
Frequent questions
A. RAG means augmented recovery generation. It obtains relevant data from a vector database, ensuring that chatbot responses are based on a real context instead of only models predictions.
A. Simply change the Entruste_Model_Name in the insqualidate class to your favorite clamp model, ensuring that you admit text embedding.
A. Adjust the QA_PROMPT_TMPL_STR in the RAG class to include the additional instructions or formatting necessary for its application.
A. QDRONT provides an efficient vector search, which easily facilitates the relevant context within large integrated text sets.
Log in to continue reading and enjoying content cured by experts.
(Tagstotranslate) Blogathon