CUDA (Nvidia)
Getting Started
Tips
- [IMPORTANT] Deeply confirm version compatibility and system requirements including target software (e.g., PyTorch) before installation !!!
- The latest version tends to be at higher risk of getting into trouble
- Nouveau drivers must first be disabled
- Secure Boot should be turned off for easy installation of GPU drivers
- Two options are available:
- (A) Official instruction by Nvidia
- (B) Using apt packages
- Nvidia Container Toolkit is necessary to enable GPU support inside a Docker container
Preparation
## system uname -a cat /etc/*release
## gcc gcc --version
## cuda-capable gpu lspci | grep -i nvidia
## nouveau drivers lsmod | grep nouveau
## secure boot sudo dmesg | grep Secure
cuDNN
https://docs.nvidia.com/deeplearning/cudnn/
https://docs.nvidia.com/deeplearning/cudnn/support-matrix/index.html
Install CUDA (11.8) on Debian 11 : Option (A)
https://developer.nvidia.com/cuda-downloads
## for add-apt-repository sudo apt-get install software-properties-common
## kernel headers are required !!! sudo apt-get install linux-headers-$(uname -r)
## Linux > x86_64 > Debian > 11 > deb (local) wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-repo-debian11-11-8-local_11.8.0-520.61.05-1_amd64.deb sudo dpkg -i cuda-repo-debian11-11-8-local_11.8.0-520.61.05-1_amd64.deb sudo cp /var/cuda-repo-debian11-11-8-local/cuda-*-keyring.gpg /usr/share/keyrings/ sudo add-apt-repository contrib sudo apt-get update sudo apt-get -y install cuda
dpkg -l | grep cuda
sudo tee /etc/profile.d/cuda.sh << EOF > /dev/null export CUDA_HOME="/usr/local/cuda" export PATH="\$CUDA_HOME/bin\${PATH:+:\${PATH}}" export LD_LIBRARY_PATH="\$CUDA_HOME/lib64\${LD_LIBRARY_PATH:+:\${LD_LIBRARY_PATH}}" EOF
nvidia-smi
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 | | N/A 71C P0 28W / 70W | 2MiB / 15360MiB | 6% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
Install cuDNN (8.9.1)
https://developer.nvidia.com/cudnn
sudo dpkg -i cudnn-local-repo-debian11-8.9.1.23_1.0-1_amd64.deb sudo cp /var/cudnn-local-repo-debian11-8.9.1.23/cudnn-*-keyring.gpg /usr/share/keyrings/ sudo apt-get update sudo apt-get -y install libcudnn8 sudo apt-get -y install libcudnn8-dev sudo apt-get -y install libcudnn8-samples
dpkg -l | grep cudnn
Install CUDA (11.4) on Ubuntu 20.04 : Option (B)
https://docs.nvidia.com/deploy/cuda-compatibility/
## if latest version not supported by ubuntu official is needed #sudo add-apt-repository ppa:graphics-drivers/ppa sudo apt update
## check which version is recommended sudo apt install ubuntu-drivers-common ubuntu-drivers devices #sudo ubuntu-drivers autoinstall
## cuda version depends on driver version: cuda 11.4 = nvidia-driver-470 sudo apt purge nvidia* libnvidia* sudo apt install nvidia-driver-470
## install nvcc (as needed) sudo apt install nvidia-cuda-toolkit
sudo reboot
dpkg -l | grep nvidia
nvidia-smi
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.182.03 Driver Version: 470.182.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 | | N/A 47C P8 10W / 70W | 70MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 880 G /usr/lib/xorg/Xorg 59MiB | | 0 N/A N/A 1091 G /usr/bin/gnome-shell 9MiB | +-----------------------------------------------------------------------------+
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Sun_Jul_28_19:07:16_PDT_2019 Cuda compilation tools, release 10.1, V10.1.243
Troubleshooting - Could not open Aplay
## ERROR:root:could not open aplay -l ## Traceback (most recent call last): ## File "/usr/share/ubuntu-drivers-common/detect/sl-modem.py", line 35, in detect ## aplay = subprocess.Popen( ## File "/usr/lib/python3.8/subprocess.py", line 858, in __init__ ## self._execute_child(args, executable, preexec_fn, close_fds, ## File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child ## raise child_exception_type(errno_num, err_msg, err_filename) ## FileNotFoundError: [Errno 2] No such file or directory: 'aplay'
## Install the Advanced Linux Sound Architecture (ALSA) if sound devices are important (as needed) sudo apt install alsa-base
Install cuDNN (8.9.3)
https://developer.nvidia.com/cudnn
sudo dpkg -i cudnn-local-repo-ubuntu2004-8.9.3.28_1.0-1_amd64.deb sudo cp /var/cudnn-local-repo-ubuntu2004-8.9.3.28/cudnn-*-keyring.gpg /usr/share/keyrings/ sudo apt-get update sudo apt-get -y install libcudnn8 sudo apt-get -y install libcudnn8-dev sudo apt-get -y install libcudnn8-samples
dpkg -l | grep cudnn
Install Nvidia Container Toolkit
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/libnvidia-container/$(. /etc/os-release;echo $ID$VERSION_ID)/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt update
sudo apt -y install nvidia-container-toolkit
## Restart docker (as needed) sudo systemctl restart docker
nvidia-container-cli info
NVRM version: 470.199.02 CUDA version: 11.4 Device Index: 0 Device Minor: 0 Model: Tesla T4 Brand: Nvidia GPU UUID: GPU-866985fb-8c54-aee9-977b-bdea38b0bd75 Bus Location: 00000000:00:04.0 Architecture: 7.5
Install CUDA (10.2) on Ubuntu 18.04 : Option (A)
https://developer.nvidia.com/cuda-downloads
## kernel headers are required !!! sudo apt-get install linux-headers-$(uname -r)
## Linux > x86_64 > Ubuntu > 18.04 > deb (local) wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb sudo dpkg -i cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb sudo apt-key add /var/cuda-repo-10-2-local-10.2.89-440.33.01/7fa2af80.pub sudo apt-get update sudo apt-get -y install cuda
dpkg -l | grep cuda
sudo tee /etc/profile.d/cuda.sh << EOF > /dev/null export CUDA_HOME="/usr/local/cuda" export PATH="\$CUDA_HOME/bin\${PATH:+:\${PATH}}" export LD_LIBRARY_PATH="\$CUDA_HOME/lib64\${LD_LIBRARY_PATH:+:\${LD_LIBRARY_PATH}}" EOF
nvidia-smi
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 | | N/A 77C P0 23W / 70W | 0MiB / 15109MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
Install cuDNN (7.6.5)
https://developer.nvidia.com/cudnn
sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.2_amd64.deb sudo dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.2_amd64.deb sudo dpkg -i libcudnn7-doc_7.6.5.32-1+cuda10.2_amd64.deb
dpkg -l | grep cudnn
Testing CUDA/cuDNN
cp -r /usr/src/cudnn_samples_v7 ~/ cd ~/cudnn_samples_v7/mnistCUDNN make clean make ./mnistCUDNN
cudnnGetVersion() : 7605 , CUDNN_VERSION from cudnn.h : 7605 (7.6.5) Host compiler version : GCC 7.5.0 There are 1 CUDA capable devices on your machine : device 0 : sms 40 Capabilities 7.5, SmClock 1590.0 Mhz, MemSize (Mb) 15109, MemClock 5001.0 Mhz, Ecc=1, boardGroupID=0 Using device 0 ... Result of classification: 1 3 5 Test passed!
References
https://developer.nvidia.com/cuda-downloads
https://developer.nvidia.com/cuda-toolkit-archive
https://developer.nvidia.com/cudnn
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/
https://docs.nvidia.com/deeplearning/cudnn/install-guide/
https://docs.nvidia.com/deploy/cuda-compatibility/
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
Acknowledgments
Daiphys is a professional-service company for research and development of leading-edge technologies in science and engineering.
Get started accelerating your business through our deep expertise in R&D with AI, quantum computing, and space development; please get in touch with Daiphys today!
Daiphys Technologies LLC - https://www.daiphys.com/