Ever wished you had a personal tutor to help you solve tricky math problems? In this article, we’ll explore how to build a math problem solver chat app using LangChain, Gemma 9b, Llama 3.2 Vision and Streamlit. Our app will not only understand and solve text-based math problems but also able to solve image-based questions. Let’s look at the problem statement and explore how to approach and solve this problem step-by-step.
Learning Outcomes
- Learn to create a powerful, interactive Chat App using LangChain to integrate external tools and solve tasks.
- Master the process of building a Chat App with LangChain that can efficiently solve complex math problems.
- Explore the use of APIs and environment variables to securely interact with large language models.
- Gain hands-on experience in designing a user-friendly web app with dynamic question-solving capabilities.
- Discover techniques for seamless interaction between frontend interfaces and backend ai models.
This article was published as a part of the Data Science Blogathon.
Defining the Challenge: Business Case and Objectives
We are an edtech company looking to develop an innovative ai-powered application that can solve both text-based and image-based math problems in real-time. The app should provide solutions with step-by-step explanations to enhance learning and engagement for students, educators, and independent learners.
We are tasking you to design and build this application using latest ai technologies. The app must be scalable, user-friendly, and capable of processing both textual inputs and images with a seamless experience.
Proposed Solution: Approach and Implementation Strategy
We will now discuss proposed solutions below:
Gemma2-9B It
It is an open source large language model from Google designed to process and generate human-like text with remarkable accuracy. In this application:
- Role: It serves as the “brain” for solving math problems presented in text format.
- How It Works: When a user inputs a text-based math problem, Gemma2-9B understands the question, applies the necessary mathematical logic, and generates a solution.
Llama 3.2 Vision
It is an open source Model from Meta ai, capable of processing and analyzing images, including handwritten or printed math problems.
- Role: Enables the app to “see” and interpret math problems provided in image format and generate the response.
- How It Works: When users upload an image, Llama 3.2 Vision Model identifies the mathematical expressions or questions within it, converts them into a format suitable for problem-solving.
LangChain
It is a framework specifically designed for building applications that involve interactions between language models and external systems.
- Role: Acts as the intermediary between the app’s interface and the ai models, managing the flow of information.
- How It Works:
- It coordinates how the user’s input (text or image) is processed.
- It ensures the smooth exchange of data between Gemma2-9B, Llama 3.2 Vision Model, and the app interface.
Streamlit
It is an open-source Python library for creating interactive web applications quickly and easily.
- Role: It is used to write frontend in Python
- How It Works:
- Developers can use Streamlit to design and deploy a web interface where users input text or upload images.
- The interface interacts seamlessly with LangChain and the underlying ai models to display results.
Visualizing the Approach: Flow Diagram of the Solution
The process begins by setting up the environment, checking the Groq API key, and configuring the Streamlit page settings. It then initializes the Text LLM (ChatGroq) and integrates tools like Wikipedia and a Calculator to enhance the text agent’s capabilities. A welcome message and sidebar navigation guide the user through the interface, where they can input either text or image-based queries. The text section collects user questions and processes them using the text agent, which utilizes the LLM and external tools to generate answers. Similarly, for image queries, the image section allows users to upload images, which are then processed by the image-specific LLM (ChatGroq).
Once the text or image query is processed, the respective agent generates and displays the appropriate answers. The system ensures smooth interaction by alternating between handling text and image queries. After displaying the answers, the process concludes, and the system is ready for the next query. This flow creates an intuitive, multi-modal experience where users can ask both text and image-based questions, with the system providing accurate and efficient responses.
Setting Up the Foundation
Setting up the foundation is a crucial step in ensuring a seamless integration of tools and processes, laying the groundwork for the successful operation of the system.
Environment Setup
First things first, set up your development environment. Make sure you have Python installed and create a virtual environment to keep your project dependencies organized.
# Create a Environment
python -m venv env
# Activate it on Windows
.\env\Scripts\activate
# Activate in MacOS/Linux
source env/bin/activate
Install Dependencies
Install the necessary libraries using
pip install -r https://raw.githubusercontent.com/Gouravlohar/Math-Solver/refs/heads/master/requirements.txt
Get the Groq API
Import Necessary Libraries
import streamlit as st
import os
import base64
from dotenv import load_dotenv
from langchain_groq import ChatGroq
from langchain.chains import LLMMathChain, LLMChain
from langchain.prompts import PromptTemplate
from langchain_community.utilities import WikipediaAPIWrapper
from langchain.agents.agent_types import AgentType
from langchain.agents import Tool, initialize_agent
from langchain_community.callbacks.streamlit import StreamlitCallbackHandler
from groq import Groq
These imports collectively set up the necessary libraries and modules to create a Streamlit web application that interacts with language models for solving mathematical problems and answering questions based on text and image inputs.
Load Environment Variables
load_dotenv()
groq_api_key = os.getenv("GROQ_API_KEY")
if not groq_api_key:
st.error("Groq API Key not found in .env file")
st.stop()
This section of the code is responsible for loading environment variables and ensuring that the necessary API key for Groq is available
Set up the Both LLM’s
st.set_page_config(page_title="Math Solver", page_icon="👨🔬")
st.title("Math Solver")
llm_text = ChatGroq(model="gemma2-9b-it", groq_api_key=groq_api_key)
llm_image = ChatGroq(model="llama-3.2-90b-vision-preview", groq_api_key=groq_api_key)
This section of the code sets up the Streamlit application by configuring its page title and icon. It then initializes two different language models (LLMs) from llm_text for handling text-based questions using the “gemma2-9b-it” model, and llm_image for handling questions that include images using the “llama-3.2-90b-vision-preview” model. Both models are authenticated using the previously retrieved Groq API key.
Initialize Tools and Prompt Template
wikipedia_wrapper = WikipediaAPIWrapper()
wikipedia_tool = Tool(
name="Wikipedia",
func=wikipedia_wrapper.run,
description="A tool for searching the Internet to find various information on the topics mentioned."
)
math_chain = LLMMathChain.from_llm(llm=llm_text)
calculator = Tool(
name="Calculator",
func=math_chain.run,
description="A tool for solving mathematical problems. Provide only the mathematical expressions."
)
prompt = """
You are a mathematical problem-solving assistant tasked with helping users solve their questions. Arrive at the solution logically, providing a clear and step-by-step explanation. Present your response in a structured point-wise format for better understanding.
Question: {question}
Answer:
"""
prompt_template = PromptTemplate(
input_variables=("question"),
template=prompt
)
# Combine all the tools into a chain for text questions
chain = LLMChain(llm=llm_text, prompt=prompt_template)
reasoning_tool = Tool(
name="Reasoning Tool",
func=chain.run,
description="A tool for answering logic-based and reasoning questions."
)
# Initialize the agents for text questions
assistant_agent_text = initialize_agent(
tools=(wikipedia_tool, calculator, reasoning_tool),
llm=llm_text,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=False,
handle_parsing_errors=True
)
This part of the code initializes various tools and configurations required to handle text-based questions in the Streamlit application. It sets up the tool for Wikipedia search using the WikipediaAPIWrapper, which allows the application to fetch information from the internet, and initializes a mathematical tool using the LLMMathChain class, which uses the llm_text model for solving math problems, configured on calculator specifically for mathematical expressions. It also defines a prompt template to structure questions and expected answers in a clear, step-by-step manner. This template guides the language model to generate a logical and well-explained response to each user query.
Streamlit Session State
if "messages" not in st.session_state:
st.session_state("messages") = (
{"role": "assistant", "content": "Welcome! I am your Assistant. How can I help you today?"}
)
for msg in st.session_state.messages:
if msg("role") == "user" and "image" in msg:
st.chat_message(msg("role")).write(msg('content'))
st.image(msg("image"), caption='Uploaded Image', use_column_width=True)
else:
st.chat_message(msg("role")).write(msg('content'))
The code initializes chat messages in the session state if they do not exist, starting with a default welcome message from the assistant. Subsequently, it loops through messages in st.session_state and prints each into the chat interface. For a message that is from a user and carries an image, the text content along with uploaded image will be rendered with a caption. If the message does not contain an image, it displays only the text content. All chat messages-besides any uploaded images-to be displayed inside the chat interface are also correct.
Sidebar and Response Cleaning
st.sidebar.header("Navigation")
if st.sidebar.button("Text Question"):
st.session_state("section") = "text"
if st.sidebar.button("Image Question"):
st.session_state("section") = "image"
if "section" not in st.session_state:
st.session_state("section") = "text"
def clean_response(response):
if "```" in response:
response = response.split("```")(1).strip()
return response
This Section of code makes the sidebar for Text Section and Image Section and the function clean_response cleaning the response from LLM.
Processing Text-Based Inquiries
Processing text-based inquiries focuses on handling and addressing user questions in text form, utilizing language models to generate precise responses based on the input provided.
if st.session_state("section") == "text":
st.header("Text Question")
st.write("Please enter your mathematical question below, and I will provide a detailed solution.")
question = st.text_area("Your Question:", "Example: I have 5 apples and 3 oranges. If I eat 2 apples, how many fruits do I have left?")
if st.button("Get Answer"):
if question:
with st.spinner("Generating response..."):
st.session_state.messages.append({"role": "user", "content": question})
st.chat_message("user").write(question)
st_cb = StreamlitCallbackHandler(st.container(), expand_new_thoughts=False)
try:
response = assistant_agent_text.run(st.session_state.messages, callbacks=(st_cb))
cleaned_response = clean_response(response)
st.session_state.messages.append({'role': 'assistant', "content": cleaned_response})
st.write('### Response:')
st.success(cleaned_response)
except ValueError as e:
st.error(f"An error occurred: {e}")
else:
st.warning("Please enter a question to get an answer.")
This section of the code handles the functionality of the “Text Question” section in the Streamlit application. When the section is active, it provides a header and a space to input any question related to mathematics. On the click of the “Get Answer” button, if the question is entered in the text area, it displays a spinner that indicates a response is being generated. The question entered by the user is added to the session state messages and rendered in the chat interface.
Processing Image-Based Inquiries
Processing image-based inquiries involves analyzing and interpreting images uploaded by users, using advanced models to generate accurate responses or insights based on the visual content.
elif st.session_state("section") == "image":
st.header("Image Question")
st.write("Please enter your question below and upload an image. I will provide a detailed solution.")
question = st.text_area("Your Question:", "Example: What will be the answer?")
uploaded_file = st.file_uploader("Upload an image", type=("jpg", "jpeg", "png"))
if st.button("Get Answer"):
if question and uploaded_file is not None:
with st.spinner("Generating response..."):
image_data = uploaded_file.read()
image_data_url = f"data:image/jpeg;base64,{base64.b64encode(image_data).decode()}"
st.session_state.messages.append({"role": "user", "content": question, "image": image_data})
st.chat_message("user").write(question)
st.image(image_data, caption='Uploaded Image', use_column_width=True)
This section of the code handles the “Image Question” functionality in the Streamlit application. When the “Image Question” section is active, it displays a header, a text area for users to input their questions, and an option to upload an image. Upon clicking the “Get Answer” button, if both a question and an image are provided, it shows a spinner indicating that a response is being generated. The uploaded image is read and encoded in base64 format. The user’s question and the image data are appended to the session state messages and displayed in the chat interface, with the image shown alongside the question. This setup ensures that both the text and image inputs are correctly captured and displayed for further processing.
Initialize Groq Client for Llama 3.2 Vision Model
client = Groq()
messages = (
{
"role": "user",
"content": (
{
"type": "text",
"text": question
},
{
"type": "image_url",
"image_url": {
"url": image_data_url
}
}
)
}
)
This section will prepare the message for Llama vision model
Groq API Call
try:
completion = client.chat.completions.create(
model="llama-3.2-90b-vision-preview",
messages=messages,
temperature=1,
max_tokens=1024,
top_p=1,
stream=False,
stop=None,
)
This setup sends the user’s question and image to the Groq API, which processes the inputs using the specified model and returns a generated response.
Response from Image Model
response = completion.choices(0).message.content
cleaned_response = clean_response(response)
st.session_state.messages.append({'role': 'assistant', "content": cleaned_response})
st.write('### Response:')
st.success(cleaned_response)
except ValueError as e:
st.error(f"An error occurred: {e}")
else:
st.warning("Please enter a question and upload an image to get an answer.")
This section of the code processes the response from the Groq API after generating a completion. It extracts the content of the response from the first choice in the completion result and cleans it using the clean_response function. The system appends the cleaned response to the session state messages with the role of “assistant” and displays it in the chat interface. The response appears under a “Response” header with a success message. If a ValueError occurs, the system displays an error message. If either the question or the image is not provided, a warning prompts the user to enter both to get an answer.
Check the Full Code in GitHub Repo Here.
Output
Input for Text Section
A tank has three pipes attached to it. Pipe A can fill the tank in 4 hours, Pipe B can fill it in 6 hours, and Pipe C can empty the tank in 3 hours. If all three pipes are opened together, how long will it take to fill the tank completely?
Input for Image Section
Conclusion
By combining the powers of Gemma 9b, Llama 3.2 Vision, LangChain, and Streamlit, it is possible to create a robust and user-friendly math problem-solving app that can revolutionize how students learn and engage with mathematics, providing step-by-step solutions and real-time feedback. This helps overcome not only the complexity issues within mathematical concepts but, more importantly, offers a scalable and accessible solution for learners at all levels.
This is one example of many ways such large language models and ai can be used in education. As we continue to develop these technologies, even more creative and impactful applications will emerge to change how we learn and teach.
What do you think of such a concept? Have you ever tried to develop ai-based edutainment applications? Share your experiences and ideas in the comments below!
Key Takeaways
- You can build a powerful math problem solver using advanced ai models like Gemma 2 9b and Llama 3.2.
- Combine text and image processing to create an app that can handle various types of math problems.
- Learn how to integrate LangChain with various tools to create a powerful Math Problem Solver Chat App that enhances user experience.
- Leverage Groq acceleration to ensure your app delivers quick responses.
- Streamlit makes it easy to build an intuitive and engaging user interface.
- Consider the ethical implications and design your app to promote learning and understanding.
Frequently Asked Questions
A. Gemma 2 9b is a powerful language model developed by Google, capable of understanding and solving complex math problems presented in text form.
A. The app uses the Meta Llama 3.2 vision model to interpret math problems in images. It then extracts the problem and generate the response.
A. Yes, you can design the app to display the steps involved in solving a problem, which can be a valuable learning tool for users.
A. It’s important to ensure the app is used responsibly and doesn’t facilitate cheating or hinder genuine learning. Design features that promote understanding and encourage users to engage with the problem-solving process.
A. You can find more information about Gemma 2 9b, Llama 3.2, Groq, LangChain, and Streamlit on Analytics Vidhya, their respective official websites and documentation pages.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.