CUDA (Nvidia)

[IMPORTANT] Deeply confirm version compatibility and system requirements including target software (e.g., PyTorch) before installation !!!
The latest version tends to be at higher risk of getting into trouble
Nouveau drivers must first be disabled
Secure Boot should be turned off for easy installation of GPU drivers
Two options are available:
- (A) Official instruction by Nvidia
- (B) Using apt packages
Nvidia Container Toolkit is necessary to enable GPU support inside a Docker container

## system
uname -a
cat /etc/*release

## gcc
gcc --version

## cuda-capable gpu
lspci | grep -i nvidia

## nouveau drivers
lsmod | grep nouveau

## secure boot
sudo dmesg | grep Secure

https://docs.nvidia.com/deeplearning/cudnn/
https://docs.nvidia.com/deeplearning/cudnn/support-matrix/index.html

https://developer.nvidia.com/cuda-downloads

## for add-apt-repository
sudo apt-get install software-properties-common

## kernel headers are required !!!
sudo apt-get install linux-headers-$(uname -r)

## Linux > x86_64 > Debian > 11 > deb (local)
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-repo-debian11-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo dpkg -i cuda-repo-debian11-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo cp /var/cuda-repo-debian11-11-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo add-apt-repository contrib
sudo apt-get update
sudo apt-get -y install cuda

dpkg -l | grep cuda

sudo tee /etc/profile.d/cuda.sh << EOF > /dev/null
export CUDA_HOME="/usr/local/cuda"
export PATH="\$CUDA_HOME/bin\${PATH:+:\${PATH}}"
export LD_LIBRARY_PATH="\$CUDA_HOME/lib64\${LD_LIBRARY_PATH:+:\${LD_LIBRARY_PATH}}"
EOF

nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   71C    P0    28W /  70W |      2MiB / 15360MiB |      6%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

https://developer.nvidia.com/cudnn

sudo dpkg -i cudnn-local-repo-debian11-8.9.1.23_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-debian11-8.9.1.23/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install libcudnn8
sudo apt-get -y install libcudnn8-dev
sudo apt-get -y install libcudnn8-samples

dpkg -l | grep cudnn

https://docs.nvidia.com/deploy/cuda-compatibility/

## if latest version not supported by ubuntu official is needed
#sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update

## check which version is recommended
sudo apt install ubuntu-drivers-common
ubuntu-drivers devices
#sudo ubuntu-drivers autoinstall

## cuda version depends on driver version: cuda 11.4 = nvidia-driver-470
sudo apt purge nvidia* libnvidia*
sudo apt install nvidia-driver-470

## install nvcc (as needed)
sudo apt install nvidia-cuda-toolkit

sudo reboot

dpkg -l | grep nvidia

nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03   Driver Version: 470.182.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   47C    P8    10W /  70W |     70MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       880      G   /usr/lib/xorg/Xorg                 59MiB |
|    0   N/A  N/A      1091      G   /usr/bin/gnome-shell                9MiB |
+-----------------------------------------------------------------------------+

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

Troubleshooting - Could not open Aplay

## ERROR:root:could not open aplay -l
## Traceback (most recent call last):
##   File "/usr/share/ubuntu-drivers-common/detect/sl-modem.py", line 35, in detect
##     aplay = subprocess.Popen(
##   File "/usr/lib/python3.8/subprocess.py", line 858, in __init__
##     self._execute_child(args, executable, preexec_fn, close_fds,
##   File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child
##     raise child_exception_type(errno_num, err_msg, err_filename)
## FileNotFoundError: [Errno 2] No such file or directory: 'aplay'

## Install the Advanced Linux Sound Architecture (ALSA) if sound devices are important (as needed)
sudo apt install alsa-base

https://developer.nvidia.com/cudnn

sudo dpkg -i cudnn-local-repo-ubuntu2004-8.9.3.28_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2004-8.9.3.28/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install libcudnn8
sudo apt-get -y install libcudnn8-dev
sudo apt-get -y install libcudnn8-samples

dpkg -l | grep cudnn

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$(. /etc/os-release;echo $ID$VERSION_ID)/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update

sudo apt -y install nvidia-container-toolkit

## Restart docker (as needed)
sudo systemctl restart docker

nvidia-container-cli info

NVRM version:   470.199.02
CUDA version:   11.4

Device Index:   0
Device Minor:   0
Model:          Tesla T4
Brand:          Nvidia
GPU UUID:       GPU-866985fb-8c54-aee9-977b-bdea38b0bd75
Bus Location:   00000000:00:04.0
Architecture:   7.5

https://developer.nvidia.com/cuda-downloads

## kernel headers are required !!!
sudo apt-get install linux-headers-$(uname -r)

## Linux > x86_64 > Ubuntu > 18.04 > deb (local)
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-2-local-10.2.89-440.33.01/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

dpkg -l | grep cuda

sudo tee /etc/profile.d/cuda.sh << EOF > /dev/null
export CUDA_HOME="/usr/local/cuda"
export PATH="\$CUDA_HOME/bin\${PATH:+:\${PATH}}"
export LD_LIBRARY_PATH="\$CUDA_HOME/lib64\${LD_LIBRARY_PATH:+:\${LD_LIBRARY_PATH}}"
EOF

nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   77C    P0    23W /  70W |      0MiB / 15109MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

https://developer.nvidia.com/cudnn

sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.2_amd64.deb
sudo dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.2_amd64.deb
sudo dpkg -i libcudnn7-doc_7.6.5.32-1+cuda10.2_amd64.deb

dpkg -l | grep cudnn

cp -r /usr/src/cudnn_samples_v7 ~/
cd ~/cudnn_samples_v7/mnistCUDNN
make clean
make
./mnistCUDNN

cudnnGetVersion() : 7605 , CUDNN_VERSION from cudnn.h : 7605 (7.6.5)
Host compiler version : GCC 7.5.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 40  Capabilities 7.5, SmClock 1590.0 Mhz, MemSize (Mb) 15109, MemClock 5001.0 Mhz, Ecc=1, boardGroupID=0
Using device 0
...
Result of classification: 1 3 5
Test passed!

https://developer.nvidia.com/cuda-downloads
https://developer.nvidia.com/cuda-toolkit-archive
https://developer.nvidia.com/cudnn
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/
https://docs.nvidia.com/deeplearning/cudnn/install-guide/
https://docs.nvidia.com/deploy/cuda-compatibility/
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

Daiphys is a professional services company in research and development of leading-edge technologies in science and engineering.
Get started accelerating your business through our deep expertise in R&D with AI, quantum computing, and space development; please get in touch with Daiphys today!

Daiphys Technologies LLC - https://www.daiphys.com/

CUDA (Nvidia)

Getting Started

Tips

Preparation

cuDNN

Install CUDA (11.8) on Debian 11 : Option (A)

Install cuDNN (8.9.1)

Install CUDA (11.4) on Ubuntu 20.04 : Option (B)

Troubleshooting - Could not open Aplay

Install cuDNN (8.9.3)

Install Nvidia Container Toolkit

Install CUDA (10.2) on Ubuntu 18.04 : Option (A)

Install cuDNN (7.6.5)

Testing CUDA/cuDNN

References

Acknowledgments