In the first part of the story, we used a free Google Colab instance to run a Mistral-7B model and extract information using the FAISS (Facebook ai Similarity Search) database. In this part we will go further and show how to run a LLaMA 2 13B model; We will also try some additional features of LangChain, such as creating chat-based applications and using agents. In the same way as in the first part, all the components used are based on open source projects and will work completely free of charge.
Let's get into it!
LLaMA.cpp
TO LLaMA.CPP is a very interesting open source project, originally designed to run a LLaMA model on Macbooks, but its functionality grew far beyond that. First of all, it is written in simple C/C++ with no external dependencies and can run on any hardware (CUDA, OpenCL, and Apple Silicon are supported; it can even run on a Raspberry Pi). Secondly, LLaMA.CPP can connect with ai/langchain” rel=”noopener ugc nofollow” target=”_blank”>LangChain, which allows us to try many of its features for free without having an OpenAI key. Last but not least, since LLaMA.CPP works everywhere, it is a good candidate to run on a free Google Colab instance. As a reminder, Google offers free access to Python laptops with 12 GB of RAM and 16 GB of VRAM, which can be opened using the Collaborative research page. The code opens in the web browser and runs in the cloud, so it can be accessed by everyone, even from a budget, minimalist PC.
Before using LLaMA, let's install the library. The installation itself is simple; we just need to enable LLAMA_CUBLAS
before using pip:
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip3 install llama-cpp-python
!pip3 install huggingface-hub
!pip3 install sentence-transformers langchain langchain-experimental
!huggingface-cli download TheBloke/Llama-2-7b-Chat-GGUF llama-2-7b-chat.Q4_K_M.gguf --local-dir /content --local-dir-use-symlinks False
For the first test, I will use a 7B model. Here I also installed a huggingface-hub
library, which allows us to automatically download a “Llama-2–7b-Chat” model in the GGUF format necessary for LLaMA.CPP. I also installed a ai/langchain” rel=”noopener ugc nofollow” target=”_blank”>LangChain…