How to Set Up a Multi-GPU Linux Machine for Deep Learning in 2024 | by Nika | May, 2024

DEEP LEARNING WITH MULTIPLE GPUS

Super fast CUDA and PyTorch setup in minutes!

Author Image: Multi-GPU Machine (Cartoon)

As deep learning models (especially LLMs) continue to grow, the need for more GPU memory (VRAM) increases to develop and use them locally. Building or obtaining a multi-GPU machine is just the first part of the challenge. Most libraries and applications only use one GPU by default. Therefore, the machine also needs to have proper drivers along with libraries that can take advantage of multi-GPU setup.

This story provides a guide on how to set up a multi-GPU (Nvidia) Linux machine with important libraries. Hopefully this will save you some time experimenting and allow you to get started with your development.

At the end, links are provided to popular open source libraries that can take advantage of multi-GPU setup for deep learning.

Aim

Set up a multi-GPU Linux system with the necessary libraries, such as CUDA Toolkit and PyTorch, to get started with deep learning. . The same steps also apply to a single GPU machine.

We will install 1) CUDA Toolkit, 2) PyTorch and 3) Miniconda to get started with deep learning using frameworks like exllamaV2 and torchtune.

Starting

Author Image: nvidia-smi command output on a Linux machine with 8 Nvidia A10G GPUs

Check the number of GPUs installed on the machine using the nvidia-smi command in the terminal. It should print a list of all installed GPUs. If there is a discrepancy or if the command does not work, first install the Nvidia drivers for your version of Linux. Make sure that nvidia-smi The command prints a list of all the GPUs installed on your machine as shown above.

Follow this page to install Nvidia drivers if you haven't already:

How to Install NVIDIA Drivers on Ubuntu 22.04 — Linux Tutorials — Learn Linux Setup– (Source: linuxconfig.org)

Step 1 Install CUDA Toolkit

Check if there is any existing CUDA folder in usr/local/cuda-xx. That means there is already a version of CUDA installed. If you already have the desired CUDA toolkit installed (check with your nvcc command in your terminal) skip to Step 2.

Check the CUDA version required for the PyTorch library you want: Start locally | PyTorch (We are installing Install CUDA 12.1)

Gonna CUDA Toolkit 12.1 Downloads | NVIDIA Developer for Linux commands to install CUDA 12.1 (choose your operating system version and the corresponding “deb (local)” installer type).

Selected options for Ubuntu 22 (Source: developer.nvidia.com)

Terminal commands for the base installer will appear depending on the options chosen. Copy, paste and run them on your Linux terminal to install the CUDA toolkit. For example, for x86_64 Ubuntu 22, run the following commands by opening the terminal in the downloads folder:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

When installing the CUDA toolkit, the installer may request a kernel update. If any popup appears in the terminal to update the kernel, press the button esc button to cancel it. Do not update the kernel during this stage! It may damage Nvidia drivers. .

Reboot the Linux machine after installation. He nvcc The command still won't work. You must add the CUDA installation to PATH. Open the .bashrc file using the nano editor.

nano /home/$USER/.bashrc

Scroll to the end of the .bashrc file and add these two lines:

 export PATH="/usr/local/cuda-12.1/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH"

Please note that it may change cuda-12.1 to your installed CUDA version, cuda-xx if needed in the future, 'xx' being your CUDA version.

Save your changes and close the nano editor:

 To save changes - On you keyboard, press the following: ctrl + o             --> save 
enter or return key  --> accept changes
ctrl + x             --> close editor

Close and reopen the terminal. Now him nvcc--version The command should print the CUDA version installed on your terminal.

Step 2 Install Miniconda

Before installing PyTorch, it is best to install Miniconda and then install PyTorch within a Conda environment. It is also useful to create a new Conda environment for each project.

Open the terminal in the Downloads folder and run the following commands:

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh# initiate conda
~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh

Close and reopen the terminal. Now him conda The command should work.

Step 3 Install PyTorch

(Optional): Create a new conda environment for your project. you can replace with the name of your choice. I usually name it the name of my project. You can use the conda activate and conda deactivate commands before and after working on your project.

conda create -n  python=3.11# activate the environment
conda activate

Install the PyTorch library for your CUDA version. The following commands are for cuda-12.1 that we installed:

pip3 install torch torchvision torchaudio

The above command is obtained from the PyTorch installation guide: Start locally | PyTorch .

After PyTorch installation, check the number of GPUs visible to PyTorch in the terminal.

python>> import torch
>> print(torch.cuda.device_count())
8

This should print the number of GPUs installed in the system (8 in my case) and should also match the number of GPUs listed in the nvidia-smi domain.

Viola! You're all set to start working on your deep learning projects that leverage multiple GPUs .

1. To get started, you can clone a popular model of hugging face:

2. For inference (using LLM models), clone and install exllamav2 in a separate environment. This uses all your GPUs for faster inference: (see my medium page for a detailed tutorial)

3. To make adjustments or train, you can clone and install torch melody. Follow the instructions to full finetune either lora finetune your models, taking advantage of all your GPUs: (See my medium page for a detailed tutorial)

This guide walks you through the machine setup required for multi-GPU deep learning. Now you can start working on any project that takes advantage of multiple GPUs, like torchtune for faster development!

stay tuned for more detailed tutorials on exllamaV2 and torch melody.

How to Set Up a Multi-GPU Linux Machine for Deep Learning in 2024 | by Nika | May, 2024

Technical Terrence Team

£20,000 savings? Here's how you'd aim for a monthly passive income of £2,219 with FTSE 100 shares

Leave a Reply Cancel reply

Recommended.

Detección de anomalías con AutoEncoders en Cricket

UC Berkeley Researchers Propose RingAttention: A Memory-Efficient AI Approach to Reduce Transformer Memory Requirements

Is this the last big Bitcoin crash? Expert points for the key indicator

Stablecoin Market reaches $ 204b, pointing Crypto Rally

Five ways I’ve leveraged AI in my English class

Categories

Important Links

How to Set Up a Multi-GPU Linux Machine for Deep Learning in 2024 | by Nika | May, 2024

DEEP LEARNING WITH MULTIPLE GPUS

Super fast CUDA and PyTorch setup in minutes!

Aim

Starting

Step 1 Install CUDA Toolkit

Step 2 Install Miniconda

Step 3 Install PyTorch

Related

Technical Terrence Team

£20,000 savings? Here's how you'd aim for a monthly passive income of £2,219 with FTSE 100 shares

Leave a Reply Cancel reply

Recommended.

Detección de anomalías con AutoEncoders en Cricket

UC Berkeley Researchers Propose RingAttention: A Memory-Efficient AI Approach to Reduce Transformer Memory Requirements

Is this the last big Bitcoin crash? Expert points for the key indicator

Stablecoin Market reaches $ 204b, pointing Crypto Rally

Five ways I’ve leveraged AI in my English class

Categories

Important Links

Get daily news updates to your inbox!