Nowadays, no one will be surprised to run a deep learning model in the cloud. But the situation can be much more complicated in the world of consumer or edge devices. There are several reasons for that. First, using cloud APIs requires devices to always be online. This is not a problem for a web service, but can be a deal breaker for device that needs to work without Internet access. Second, cloud APIs cost money and customers may not be happy to pay another subscription fee. Last but not least, after several years the project may be finished, the API endpoints will be closed and the expensive hardware will become a brick. Which is naturally not friendly to customers, the ecosystem and the environment. That's why I am convinced that the end-user hardware should be fully functional offline, without additional costs or use of online APIs (well, it can be optional but not mandatory).
In this article, I will show how to run a LLaMA GPT model and automatic speech recognition (ASR) on a Raspberry Pi. That will allow us to ask Raspberry Pi questions and get answers. And as promised, all of this will work completely offline.
Let's get into it!
The code presented in this article is intended to work on Raspberry Pi. But most of the methods (except the “display” part) will also work on a Windows, OSX, or Linux laptop. Thus, those readers who do not have a Raspberry Pi will be able to test the code easily and without any problem.
Hardware
For this project, I will use a Raspberry Pi 4. It is a single board computer running Linux; It is small and requires only 5 V DC without fans or active cooling:
A newer model from 2023, the Raspberry Pi 5, should be even better; According to benchmarks, it's almost 2x faster. But it's also almost 50% more expensive, and for our test, the Model 4 is pretty good.