Running local LLM and VLM on Raspberry Pi | by Pye Sone Kyaw | January 2024

Get models like Phi-2, Mistral and LLaVA running locally on a Raspberry Pi with Ollama

Host LLM and VLM using Ollama on Raspberry Pi – Source: Author

Have you ever thought about running your own large language models (LLM) or vision language models (VLM) on your own device? You probably did, but the thought of setting things up from scratch, having to manage the environment, downloading the correct model weights, and the lingering doubt of whether your device can even handle the model has probably given you pause.

Let's go one step further. Imagine running your own LLM or VLM on a device no bigger than a credit card: a Raspberry Pi. Impossible? You are welcome. I mean, I'm writing this post after all, so it's definitely possible.

Possible, yes. But why would you do it?

Borderline LLMs seem pretty far-fetched right now. But this particular niche use case should mature over time, and we will definitely see some cutting-edge solutions implemented with a fully local generative ai solution running on the device at the edge.

It's also about pushing the limits to see what's possible. If it can be done at this end of the computing scale, then it can be done at any level between a Raspberry Pi and a large, powerful server GPU.

Traditionally, cutting-edge ai has been closely linked to computer vision. Exploring the deployment of LLM and VLM at the edge adds an interesting dimension to this just emerging field.

Most importantly, I just wanted to do something fun with my recently acquired Raspberry Pi 5.

So how do we achieve all this on a Raspberry Pi? Using Ollama!

What is Ollama?

ai/” rel=”noopener ugc nofollow” target=”_blank”>Be has become one of the best solutions to run local LLMs on your personal computer without having to deal with the hassle of setting things up from scratch. With just a few commands, everything can be configured without problems. Everything is self-contained and, in my experience, works great on various devices and models. It even exposes a REST API for model inference, so you can leave it running on the Raspberry Pi and call it from your other apps and devices if you want.

ai/” rel=”noopener ugc nofollow” target=”_blank”>Ollama website

There's also Ollama Web UI which is a beautiful piece of ai UI/UX that runs perfectly with Ollama for those concerned about command line interfaces. It's basically a local ChatGPT interface, so to speak.

Together, these two pieces of open source software provide what I believe is the best locally hosted LLM experience right now.

Both Ollama and Ollama Web UI also support VLM like LLaVA, opening even more doors for this cutting-edge generative ai use case.

Technical requirements

All you need is the following:

Raspberry Pi 5 (or 4 for a less fast configuration): Opt for the 8GB RAM variant to suit the 7B models.
SD card: minimum 16 GB; The larger the size, the more models you can place. Have it already loaded with an appropriate operating system such as Raspbian Bookworm or Ubuntu
An internet connection

As I mentioned above, running Ollama on a Raspberry Pi is already near the end of the hardware spectrum. Essentially, any device more powerful than a Raspberry Pi, as long as it runs a Linux distribution and has similar memory capacity, should in theory be able to run Ollama and the models discussed in this post.

1. Ollama Installation

To install Ollama on a Raspberry Pi, we will avoid using Docker to conserve resources.

In the terminal, run

curl https://ollama.ai/install.sh | sh

You should see something similar to the image below after running the above command.

As the output says, go to 0.0.0.0:11434 to verify that Ollama is running. It is normal to see the message 'WARNING: No NVIDIA GPU detected'. Ollama will run in CPU only mode. since we are using a Raspberry Pi. But if you're following these instructions on something that's supposed to have an NVIDIA GPU, something went wrong.

For any issues or updates, please refer to the Ollama GitHub Repository.

2. Run LLM via command line

Take a look at ai/library” rel=”noopener ugc nofollow” target=”_blank”>the official Ollama model library for a list of models that can be run with Ollama. On an 8GB Raspberry Pi, models larger than 7B will not fit. Let's use Phi-2, a 2.7 billion LLM from Microsoft, now licensed by MIT.

We'll use the default Phi-2 model, but feel free to use any of the other tags found. ai/library/phi/tags” rel=”noopener ugc nofollow” target=”_blank”>here. Take a look at the ai/library/phi” rel=”noopener ugc nofollow” target=”_blank”>model page for Phi-2 to see how you can interact with it.

In the terminal, run

ollama run phi

Once you see something similar to the result below, you'll have an LLM running on Raspberry Pi! It's that easy.

There is an interaction with Phi-2 2.7B here. Obviously, you won't get the same result, but you get the idea. | Source: Author

You can try other models like Mistral, Llama-2, etc., just make sure there is enough space on the SD card for the model weights.

Naturally, the larger the model, the slower the result. On Phi-2 2.7B, I can get around 4 tokens per second. But with a Mistral 7B, the generation speed drops to around 2 tokens per second. A token is roughly equivalent to a single word.

Here an interaction with Mistral 7B | Source: Author

We now have LLM running on the Raspberry Pi, but we're not done yet. The terminal is not for everyone. Let's get the Ollama web UI up and running too!

3. Installing and running Ollama Web UI

We will follow the instructions of the Ollama Web UI Official GitHub Repository to install it without Docker. It is recommended that Node.js be at least >= 20.10, so we will follow that. It also recommends that Python be at least 3.11, but the Raspbian OS already has it installed for us.

First we have to install Node.js. In the terminal, run

curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - &&\
sudo apt-get install -y nodejs

Please change version 20.x to a more appropriate version if necessary for future readers.

Then run the code block below.

git clone https://github.com/ollama-webui/ollama-webui.git
cd ollama-webui/# Copying required .env file
cp -RPp example.env .env
# Building Frontend Using Node
npm i
npm run build
# Serving Frontend with the Backend
cd ./backend
pip install -r requirements.txt --break-system-packages 
sh start.sh

It is a slight modification of what is provided on GitHub. Please note that for simplicity and brevity we do not follow best practices such as using virtual environments and use the – break-system-packages flag. If you encounter an error like uvicorn not found, please restart the terminal session.

If everything goes correctly, you should be able to access the Ollama web UI on port 8080 via http://0.0.0.0:8080 on the Raspberry Pi, or via http://:8080/ if you access through another device on the same network.

If you see this, yes it worked | Source: Author

Once you've created an account and logged in, you should see something similar to the image below.

If you had downloaded some model weights previously, you should see them in the dropdown menu as shown below. If not, you can go to settings to download a model.

Available models will appear here | Source: Author

If you want to download new models, go to Settings > Models to pull models | Source: Author

The entire interface is very clean and intuitive, so I won't explain much about it. It's really a very well done open source project.

Here is an interaction with Mistral 7B through the Ollama web interface | Source: Author

4. Run VLM via Ollama Web UI

As I mentioned at the beginning of this article, we can also run VLM. Let's run LLaVA, a popular open source VLM that also supports Ollama. To do this, download the weights by pulling the 'key' through the interface.

Unfortunately, unlike LLMs, the setup takes a long time to interpret the image on the Raspberry Pi. The following example took about 6 minutes to process. Most of the time it is probably because the image side is not properly optimized yet, but this will definitely change in the future. Token generation speed is around 2 tokens/second.

to finish it all

At this point, we are practically done with the objectives of this article. In summary, we have managed to use Ollama and Ollama Web UI to run LLM and VLM such as Phi-2, Mistral and LLaVA on Raspberry Pi.

I can definitely imagine quite a few use cases for locally hosted LLM running on Raspberry Pi (or other small edge device), especially since 4 tokens/second seems like an acceptable speed with streaming for some use cases if we look at models around the size of Phi-2.

The field of “small” LLMs and VLMs, somewhat paradoxically named given their “large” designation, is an active area of research with quite a few model releases recently. Let's hope this emerging trend continues and more efficient and compact models continue to be released! Definitely something to keep an eye on in the coming months.

Disclaimer: I have no affiliation with Ollama or Ollama Web UI. All views and opinions are my own and do not represent any organization.

Running local LLM and VLM on Raspberry Pi | by Pye Sone Kyaw | January 2024

Technical Terrence Team

Port3 Network ICO (PORT3) is all the rage. Do not miss it

Leave a Reply Cancel reply

Recommended.

Blockchain sleuth ZachXBT faces subpoena over wide range of data

Your guide to calibrating laptop batteries

Introducing LEDGER FLEX – announced live on B24

TON rises amid uncertainty, GFOX P2E game to launch this week

Crypto Exec says Bitcoin price will hit $80,000 before the end of the year if this happens

Categories

Important Links

Running local LLM and VLM on Raspberry Pi | by Pye Sone Kyaw | January 2024

Get models like Phi-2, Mistral and LLaVA running locally on a Raspberry Pi with Ollama

Possible, yes. But why would you do it?

What is Ollama?

Technical requirements

1. Ollama Installation

2. Run LLM via command line

3. Installing and running Ollama Web UI

4. Run VLM via Ollama Web UI

to finish it all

Related

Technical Terrence Team

Port3 Network ICO (PORT3) is all the rage. Do not miss it

Leave a Reply Cancel reply

Recommended.

Blockchain sleuth ZachXBT faces subpoena over wide range of data

Your guide to calibrating laptop batteries

Introducing LEDGER FLEX – announced live on B24

TON rises amid uncertainty, GFOX P2E game to launch this week

Crypto Exec says Bitcoin price will hit $80,000 before the end of the year if this happens

Categories

Important Links

Get daily news updates to your inbox!