Harnessing the power of GPUs with CuPy in Python

Image by author

cupy is a Python library supporting NumPy and SciPy arrays, designed for GPU-accelerated computing. By replacing NumPy with CuPy syntax, you can run your code on NVIDIA CUDA or AMD ROCm platforms. This allows you to perform array-related tasks using GPU acceleration, resulting in faster processing of larger arrays.

By swapping just a few lines of code, you can take advantage of the enormous parallel processing power of GPUs to significantly speed up matrix operations such as indexing, normalization, and matrix multiplication.

CuPy also allows access to low-level CUDA functions. Allows the passage of ndarrays to existing CUDA C/C++ programs using RawKernels, optimizes performance with Streams, and enables direct calling of CUDA Runtime APIs.

You can install CuPy using pip, but first you must find the correct CUDA version using the following command.

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

It appears that the current version of Google Colab uses CUDA version 11.8. Therefore, we will proceed to install the cupy-cuda11x version.

If you are running an older version of CUDA, I have provided a table below to help you determine the appropriate CuPy package to install.

Picture of CuPy 12.2.0

After selecting the correct version, we will install the Python package using pip.

You can also use the conda command to automatically detect and install the correct version of the CuPy package if you have Anaconda installed.

conda install -c conda-forge cupy

In this section, we will compare the syntax of CuPy with Numpy and they are 95% similar. Instead of using np you will replace it with cp.

First we will create a NumPy and CuPy array using Python list. After that, we will calculate the norm of the vector.

import cupy as cp
import numpy as np

x = (3, 4, 5)
 
x_np = np.array(x)
x_cp = cp.array(x)
 
l2_np = np.linalg.norm(x_np)
l2_cp = cp.linalg.norm(x_cp)
 
print("Numpy: ", l2_np)
print("Cupy: ", l2_cp)

As we can see, we obtained similar results.

Numpy:  7.0710678118654755
Cupy:  7.0710678118654755

To convert a NumPy array to CuPy, you can simply use cp.asarray(X).

x_array = np.array((10, 22, 30))
x_cp_array = cp.asarray(x_array)
type(x_cp_array)

Or use .get()to convert CuPy to a Numpy array.

x_np_array = x_cp_array.get()
type(x_np_array)

In this section, we will compare the performance of NumPy and CuPy.

We will use time.time() to time the execution time of the code. Next, we’ll create a 3D NumPy array and perform some math functions.

import time

# NumPy and CPU Runtime
s = time.time()
x_cpu = np.ones((1000, 100, 1000))
np_result = np.sqrt(np.sum(x_cpu**2, axis=-1))
e = time.time()
np_time = e - s
print("Time consumed by NumPy: ", np_time)

Time consumed by NumPy: 0.5474584102630615

Similarly, we will create a 3D CuPy array, perform math operations, and time its performance.

# CuPy and GPU Runtime
s = time.time()
x_gpu = cp.ones((1000, 100, 1000))
cp_result = cp.sqrt(cp.sum(x_gpu**2, axis=-1))
e = time.time()
cp_time = e - s
print("\nTime consumed by CuPy: ", cp_time)

Time consumed by CuPy: 0.001028299331665039

To calculate the difference, we will divide the NumPy time with the CuPy time and it looks like we got over 500x performance increase using CuPy.

diff = np_time/cp_time
print(f'\nCuPy is {diff: .2f} X time faster than NumPy')

CuPy is 532.39 X time faster than NumPy

Note: For best results, it is recommended to perform a few warm-up runs to minimize time fluctuations.

Beyond its speed advantage, CuPy offers superior multi-GPU support, allowing you to harness the collective power of multiple GPUs.

Also, you can check my Collaboration notebookif you want to compare the results.

In conclusion, CuPy provides a simple way to accelerate NumPy code on NVIDIA GPUs. By simply making a few modifications to swap NumPy for CuPy, you can experience order of magnitude speedups in array calculations. This performance increase allows you to work with much larger data sets and models, enabling more advanced machine learning and scientific computing.

Resources

Abid Ali Awan (@1abidaliawan) is a certified professional data scientist who loves building machine learning models. Currently, he focuses on content creation and writing technical blogs on data science and machine learning technologies. Abid has a Master’s degree in technology Management and a Bachelor’s degree in Telecommunications Engineering. His vision is to build an artificial intelligence product using a graph neural network for students struggling with mental illness.