CUDA (Nvidia)

  • [IMPORTANT] Deeply confirm version compatibility and system requirements including target software (e.g., PyTorch) before installation !!!
  • The latest version tends to be at higher risk of getting into trouble
  • Nouveau drivers must first be disabled
  • Secure Boot should be turned off for easy installation of GPU drivers
  • Two options are available:
    • (A) Official instruction by Nvidia
    • (B) Using apt packages
  • Nvidia Container Toolkit is necessary to enable GPU support inside a Docker container
## system
uname -a
cat /etc/*release
## gcc
gcc --version
## cuda-capable gpu
lspci | grep -i nvidia
## nouveau drivers
lsmod | grep nouveau
## secure boot
sudo dmesg | grep Secure

https://developer.nvidia.com/cuda-downloads

## for add-apt-repository
sudo apt-get install software-properties-common
## kernel headers are required !!!
sudo apt-get install linux-headers-$(uname -r)
## Linux > x86_64 > Debian > 11 > deb (local)
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-repo-debian11-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo dpkg -i cuda-repo-debian11-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo cp /var/cuda-repo-debian11-11-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo add-apt-repository contrib
sudo apt-get update
sudo apt-get -y install cuda
dpkg -l | grep cuda
sudo tee /etc/profile.d/cuda.sh << EOF > /dev/null
export CUDA_HOME="/usr/local/cuda"
export PATH="\$CUDA_HOME/bin\${PATH:+:\${PATH}}"
export LD_LIBRARY_PATH="\$CUDA_HOME/lib64\${LD_LIBRARY_PATH:+:\${LD_LIBRARY_PATH}}"
EOF
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   71C    P0    28W /  70W |      2MiB / 15360MiB |      6%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

https://developer.nvidia.com/cudnn

sudo dpkg -i cudnn-local-repo-debian11-8.9.1.23_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-debian11-8.9.1.23/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install libcudnn8
sudo apt-get -y install libcudnn8-dev
sudo apt-get -y install libcudnn8-samples
dpkg -l | grep cudnn

https://docs.nvidia.com/deploy/cuda-compatibility/

## if latest version not supported by ubuntu official is needed
#sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
## check which version is recommended
sudo apt install ubuntu-drivers-common
ubuntu-drivers devices
#sudo ubuntu-drivers autoinstall
## cuda version depends on driver version: cuda 11.4 = nvidia-driver-470
sudo apt purge nvidia* libnvidia*
sudo apt install nvidia-driver-470
## install nvcc (as needed)
sudo apt install nvidia-cuda-toolkit
sudo reboot
dpkg -l | grep nvidia
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03   Driver Version: 470.182.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   47C    P8    10W /  70W |     70MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       880      G   /usr/lib/xorg/Xorg                 59MiB |
|    0   N/A  N/A      1091      G   /usr/bin/gnome-shell                9MiB |
+-----------------------------------------------------------------------------+
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

Troubleshooting - Could not open Aplay

## ERROR:root:could not open aplay -l
## Traceback (most recent call last):
##   File "/usr/share/ubuntu-drivers-common/detect/sl-modem.py", line 35, in detect
##     aplay = subprocess.Popen(
##   File "/usr/lib/python3.8/subprocess.py", line 858, in __init__
##     self._execute_child(args, executable, preexec_fn, close_fds,
##   File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child
##     raise child_exception_type(errno_num, err_msg, err_filename)
## FileNotFoundError: [Errno 2] No such file or directory: 'aplay'
## Install the Advanced Linux Sound Architecture (ALSA) if sound devices are important (as needed)
sudo apt install alsa-base

https://developer.nvidia.com/cudnn

sudo dpkg -i cudnn-local-repo-ubuntu2004-8.9.3.28_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2004-8.9.3.28/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install libcudnn8
sudo apt-get -y install libcudnn8-dev
sudo apt-get -y install libcudnn8-samples
dpkg -l | grep cudnn

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$(. /etc/os-release;echo $ID$VERSION_ID)/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt -y install nvidia-container-toolkit
## Restart docker (as needed)
sudo systemctl restart docker
nvidia-container-cli info
NVRM version:   470.199.02
CUDA version:   11.4

Device Index:   0
Device Minor:   0
Model:          Tesla T4
Brand:          Nvidia
GPU UUID:       GPU-866985fb-8c54-aee9-977b-bdea38b0bd75
Bus Location:   00000000:00:04.0
Architecture:   7.5

https://developer.nvidia.com/cuda-downloads

## kernel headers are required !!!
sudo apt-get install linux-headers-$(uname -r)
## Linux > x86_64 > Ubuntu > 18.04 > deb (local)
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-2-local-10.2.89-440.33.01/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
dpkg -l | grep cuda
sudo tee /etc/profile.d/cuda.sh << EOF > /dev/null
export CUDA_HOME="/usr/local/cuda"
export PATH="\$CUDA_HOME/bin\${PATH:+:\${PATH}}"
export LD_LIBRARY_PATH="\$CUDA_HOME/lib64\${LD_LIBRARY_PATH:+:\${LD_LIBRARY_PATH}}"
EOF
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   77C    P0    23W /  70W |      0MiB / 15109MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

https://developer.nvidia.com/cudnn

sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.2_amd64.deb
sudo dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.2_amd64.deb
sudo dpkg -i libcudnn7-doc_7.6.5.32-1+cuda10.2_amd64.deb
dpkg -l | grep cudnn
cp -r /usr/src/cudnn_samples_v7 ~/
cd ~/cudnn_samples_v7/mnistCUDNN
make clean
make
./mnistCUDNN
cudnnGetVersion() : 7605 , CUDNN_VERSION from cudnn.h : 7605 (7.6.5)
Host compiler version : GCC 7.5.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 40  Capabilities 7.5, SmClock 1590.0 Mhz, MemSize (Mb) 15109, MemClock 5001.0 Mhz, Ecc=1, boardGroupID=0
Using device 0
...
Result of classification: 1 3 5
Test passed!

Daiphys is a professional-service company for research and development of leading-edge technologies in science and engineering.
Get started accelerating your business through our deep expertise in R&D with AI, quantum computing, and space development; please get in touch with Daiphys today!

Name*


Email*


Message*




* Indicates required field

Daiphys Technologies LLC - https://www.daiphys.com/

  • Last modified: 2023/07/20 14:17
  • by Daiphys