Learn how to use Google's latest Germini-1.5-pro model to develop a generative ai application for counting calories
Have you ever wondered how many calories you consume when you eat dinner, for example? I do it all the time. Wouldn't it be wonderful if you could just run a photo of your plate through an app and get an estimate of the total calorie count before deciding how many you want to consume?
This calorie counting app I created can help you do just that. It's a Python app that uses Google's Gemini-1.5-Pro-Latest model to calculate the calorie count of foods.
The app accepts two inputs: a question about food and a picture of the food or foods, or simply a plate of food. It displays an answer to the question, the total calorie count in the picture, and a breakdown of the calories for each food in the picture.
In this article, I will explain the entire end-to-end process of building the app from scratch, using Google's Gemini-1.5-pro-latest (a large language generative ai model released by Google), and how I developed the front-end of the app using Streamlit.
It is worth noting here that with the advancements in the world of ai, data scientists need to gradually move from traditional deep learning to generative ai techniques to revolutionize their role. This is my main objective in educating on this topic.
Let me start by briefly explaining Gemini-1.5-pro-latest and the streamlit framework as they are the core infrastructure components of this calorie counter app.
Gemini-1.5-pro-latest is an advanced ai language model developed by Google. As it is the latest version, it has enhanced capabilities compared to previous versions, such as faster response times and improved accuracy when used in natural language processing and app building.
This is a multimodal model that works with both text and images: an advance on the Google Gemini-pro model, which only works with text prompts.
The model works by understanding and generating text, like humans, based on the input given to it. In this article, this model will be used to generate text for our calorie counter app.
Gemini-1.5-pro-latest can be integrated into other applications to bolster its ai capabilities. In this current application, the model uses generative ai techniques to break down the uploaded image into individual food items. Based on its contextual understanding of the foods in its nutritional database, it uses image recognition and object detection to estimate the calorie count, and then sums up the calories of all the foods in the image.
Streamlit is an open source Python framework for managing the user interface. This framework simplifies web development so that throughout the project you do not need to write any HTML or CSS code for the interface.
Let's dive into creating the app.
I will show you how to create the app in 5 clear steps.
1. Set up your folder structure
To get started, go to your favorite code editor (mine is VS Code) and create a project file. Call it Calories-Counter, for example. This is the current working directory. Create a virtual environment (venv), activate it in your terminal, and then create the following files: .env, calories.py, requirements.txt.
Here is a recommendation on what your folder structure should look like:
Calories-Counter/
├── venv/
│ ├── xxx
│ ├── xxx
├── .env
├── calories.py
└── requirements.txt
Please note that Gemini-1.5-Pro works best with Python versions 3.9 and above.
2. Get the Google API key
Like other Gemini models, Gemini-1.5-pro-latest is currently free for public use. To access it, you need to get an API key, which you can get from Google ai Studio by going to “Get API Key” in this article. linkOnce the key is generated, copy it for later use in your code. Save this key as an environment variable in the .env file as follows.
GOOGLE_API_KEY="paste the generated key here"
3. Install dependencies
Write the following libraries in your requirements.txt file.
- sunlit
- google-generativeai
- Python-dotenv
In the terminal, install the libraries in requirements.txt with:
python -m pip install -r requirements.txt
4. Write the Python script
Now, let's start writing the Python script in calories.py. Using the following code, import all the necessary libraries:
# import the libraries
from dotenv import load_dotenv
import streamlit as st
import os
import google.generativeai as genai
from PIL import Image
Here's how the various imported modules will be used:
- dotenv: Since this application will be configured from a Google API key environment variable, dotenv is used to load the configuration from the .env file.
- Streamlit: to create an interactive user interface for the front-end
- The os module is used to manage the current working directory while performing file operations like getting the API key from the .env file.
- The google.generativeai module, of course, gives us access to the Gemini model we are about to use.
- PIL is a Python imaging library used to manage image file formats.
The following lines will configure the API keys and load them from the environment variable store.
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))load_dotenv()
Define a function that when called will load Gemini-1.5-pro-latest and get the response as follows:
def get_gemini_reponse(input_prompt,image,user_prompt):
model=genai.GenerativeModel('gemini-1.5-pro-latest')
response=model.generate_content((input_prompt,image(0),user_prompt))
return response.text
In the above function, you can see that it takes as input the input message that will be specified later in the script, an image that will be provided by the user, and a message/question that will be provided by the user. All of this is fed into the Gemini model to return the response text.
Since Gemini-1.5-pro expects input images in the form of byte arrays, the next thing to do is to write a function that processes the loaded image and converts it to bytes.
def input_image_setup(uploaded_file):
# Check if a file has been uploaded
if uploaded_file is not None:
# Read the file into bytes
bytes_data = uploaded_file.getvalue()image_parts = (
{
"mime_type": uploaded_file.type, # Get the mime type of the uploaded file
"data": bytes_data
}
)
return image_parts
else:
raise FileNotFoundError("No file uploaded")
Next, you specify the input message that will determine the behavior of your app. Here, we are simply telling Gemini what to do with the text and image that the user will provide to the app.
input_prompt="""
You are an expert nutritionist.
You should answer the question entered by the user in the input based on the uploaded image you see.
You should also look at the food items found in the uploaded image and calculate the total calories.
Also, provide the details of every food item with calories intake in the format below:1. Item 1 - no of calories
2. Item 2 - no of calories
----
----
"""
The next step is to initialize streamlit and create a simple user interface for your calorie counter app.
st.set_page_config(page_title="Gemini Calorie Counter App")
st.header("Calorie Counter App")
input=st.text_input("Ask any question related to your food: ",key="input")
uploaded_file = st.file_uploader("Upload an image of your food", type=("jpg", "jpeg", "png"))
image=""
if uploaded_file is not None:
image = Image.open(uploaded_file)
st.image(image, caption="Uploaded Image.", use_column_width=True) #show the imagesubmit=st.button("Submit & Process") #creates a "Submit & Process" button
The above steps include all elements of the application. At this point, the user can open the application, enter a question, and upload an image.
Finally, let’s put all the pieces together so that once the “Submit and Process” button is clicked, the user gets the required response text.
# Once submit&Process button is clicked
if submit:
image_data=input_image_setup(uploaded_file)
response=get_gemini_reponse(input_prompt,image_data,input)
st.subheader("The Response is")
st.write(response)
5. Run the script and interact with your application
Now that the application development is complete, you can run it in the terminal using the command:
streamlit run calories.py
To interact with your app and see how it works, view your Streamlit app in your browser using the local URL or the generated network URL.
This is what your Streamlit app looks like when it first opens in the browser.
Once the user asks a question and uploads an image, this is what is displayed:
Once the user presses the “Submit and Process” button, the response in the image below is generated at the bottom of the screen.
For external access, consider deploying your application using cloud services such as AWS, Heroku, and Streamlit Community Cloud. In this case, we will use Streamlit Community Cloud to deploy the application for free.
At the top right of the app screen, click “Deploy” and follow the instructions to complete the deployment.
After deployment, you can share the generated app URL with other users.
Like other ai applications, the results generated are the model's best estimates, so before you fully rely on the application, consider the following as some of the potential risks:
- The calorie counter app may misclassify certain foods and therefore provide an incorrect amount of calories.
- The app does not have a reference point to estimate the food (portion) size based on the uploaded image. This can lead to errors.
- Over-reliance on the app can lead to stress and mental health issues as one may become obsessed with counting calories and worry about results that may not be too accurate.
To help reduce the risks associated with using the calorie counter, here are possible improvements that could be integrated into its development:
- Add contextual analysis of the image, which will help measure the portion size of the food being analyzed. For example, the application could be designed in such a way that a standard object, such as a spoon, included in the image of the food can be used as a reference point to measure the size of the food. This will reduce errors in the resulting total calories.
- Google could improve the diversity of foods in its training set to reduce classification errors. It could expand it to include foods from more cultures, so that even rare African foods can be identified.