A simple guide to running LlaMA 2 locally

Image by author

New open source models like LLaMA 2 have become quite advanced and are free to use. You can use them commercially or fine-tune them with your own data to develop specialized versions. Thanks to their ease of use, you can now run them locally on your own device.

In this post, we will learn how to download the necessary files and the LLaMA 2 model to run the CLI program and interact with an ai assistant. Setup is simple enough that even non-technical users or students can run it by following a few basic steps.

To install llama.cpp locally, the simplest method is to download the pre-built executable from flame throws.cpp.

To install it on Windows 11 with the NVIDIA GPU, we must first download the llama-master-eb542d3-bin-win-cublas-(version)-x64.zip archive. After downloading, extract it to the directory of your choice. It is recommended to create a new folder and extract all the files it contains.

Next, we will download the cuBLAS drivers. cudart-llama-bin-win-(version)-x64.zip and extract them to the main directory. To use GPU acceleration, you have two options: cuBLAS for NVIDIA GPU and clBLAS for AMD GPUs.

Note: The (version) is the version of CUDA installed on your local system. You can check it by running nvcc --version In the terminal.

To get started, create a folder called “Models” in the main directory. Inside the Models folder, create a new folder called “llama2_7b”. Next, download the LLaMA 2 model file from the face hugging center. You can choose the version you prefer, but for this guide, we will download the llama-2-7b-chat.Q5_K_M.gguf archive. Once the download is complete, move the file to the “llama2_7b” folder you just created.

Note: To avoid errors, be sure to download only the .gguf model files before running the mode.

Now you can open the terminal in the main directory. Right-clicking and selecting the “Open in Terminal” option. You can also open PowerShell and the US “cd” to change directories.

Copy and paste the following command and press “Enter”. We are running the main.exe file with the location of the model directory, gpu, color and system arguments.

./main.exe -m .\Models\llama2_7b\llama-2-7b-chat.Q5_K_M.gguf -i --n-gpu-layers 32 -ins --color -p "<<SYS>> As an ai assistant, your core values include being supportive, considerate, and truthful, ensuring that every interaction is guided by clarity and helpfulness. <</SYS>>"

Our CLI program called.ccp has been successfully initialized with the system prompt. It tells us that it is a useful ai assistant and shows several commands to use.

Let's try LLaMA 2 in PowerShell by providing the message. We have asked a simple question about the age of the Earth.

The answer is precise. Let's ask a follow-up question about Earth.

As you can see, the model has provided us with many interesting facts about our planet.

You can ask the ai assistant to generate code and explanation in the terminal, which you can easily copy and use in your IDE.

Perfect.

Running Llama 2 locally provides a powerful yet easy-to-use chatbot experience that adapts to your needs. By following this simple guide, you'll be able to learn how to create your own private chatbot in no time without needing to rely on paid services.

The main benefits of running LlaMA 2 locally are full control over your data and conversations, as well as no usage limits. You can chat with your bot as much as you want and even modify it to improve responses.

While less convenient than an instantly available cloud ai API, setting up on-premises provides peace of mind regarding data privacy.

Abid Ali Awan (@1abidaliawan) is a certified professional data scientist who loves building machine learning models. Currently, he focuses on content creation and writing technical blogs on data science and machine learning technologies. Abid has a Master's degree in technology Management and a Bachelor's degree in Telecommunications Engineering. His vision is to build an artificial intelligence product using a graph neural network for students struggling with mental illness.

A simple guide to running LlaMA 2 locally

Technical Terrence Team

Chipmaker Nvidia raises $15 million for nonprofits helping civilians affected by war between Israel and Hamas By Reuters

Leave a Reply Cancel reply

Recommended.

Institutional investors back Shell board lawsuit over climate risk By Reuters

Amazon suspends construction of Virginia headquarters in latest cost-cutting move

VanEck Pledges to Donate 5% of ETF Spot Profits Over a Decade to Bitcoin Core Developers

Bitcoin Price Prediction as Fear and Greed Index Reaches 'Extreme Greed' and Traders Pivot Towards 99BTC ICO and Its 859% APY

Elastos Targets $700 Billion Bitcoin Staking Market with Layer 2 Network

Categories

Important Links

A simple guide to running LlaMA 2 locally

Related

Technical Terrence Team

Chipmaker Nvidia raises $15 million for nonprofits helping civilians affected by war between Israel and Hamas By Reuters

Leave a Reply Cancel reply

Recommended.

Institutional investors back Shell board lawsuit over climate risk By Reuters

Amazon suspends construction of Virginia headquarters in latest cost-cutting move

VanEck Pledges to Donate 5% of ETF Spot Profits Over a Decade to Bitcoin Core Developers

Bitcoin Price Prediction as Fear and Greed Index Reaches 'Extreme Greed' and Traders Pivot Towards 99BTC ICO and Its 859% APY

Elastos Targets $700 Billion Bitcoin Staking Market with Layer 2 Network

Categories

Important Links

Get daily news updates to your inbox!