Skip to main content

Nvidia Docker 安装

安装 Nvidia Driver

推荐使用 graphics drivers PPA 安装 Nvidia 驱动。

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update

检测推荐的 Nvidia 显卡驱动:

ubuntu-drivers devices

安装 Nvidia 驱动(以下是 RTX2060 上的情况):

# Ubuntu 16.04 only search 430 for CUDA < 10.2
apt-cache search nvidia
sudo apt install nvidia-430

# Ubuntu 18.04 could search 440 for CUDA <= 10.2
apt-cache search nvidia | grep ^nvidia-driver
sudo apt install nvidia-driver-440

驱动对应的 CUDA 版本,请见 CUDA Compatibility

最后, sudo reboot 重启。之后,运行 nvidia-smi 输出 Nvidia 驱动信息:

$ nvidia-smi
Fri Apr 17 07:31:55 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2060 Off | 00000000:01:00.0 Off | N/A |
| N/A 40C P8 5W / N/A | 263MiB / 5934MiB | 3% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1560 G /usr/lib/xorg/Xorg 144MiB |
| 0 1726 G /usr/bin/gnome-shell 76MiB |
| 0 2063 G ...uest-channel-token=10544833948196615517 39MiB |
+-----------------------------------------------------------------------------+

如果安装 CUDA Toolkit ,请先了解 CUDA Compatibility 。安装 CUDA Toolkit 时,注意其携带的驱动版本,最好将其与驱动分别进行安装。而驱动从官方上直接找合适的版本。

或者, NVIDIA Driver Downloads 下载驱动后安装:

# Ctrl + Alt + F2/F3
systemctl isolate multi-user.target

sudo sh NVIDIA-Linux-x86_64-460.67.run
sudo reboot

安装 Docker

# update the apt package index
sudo apt-get update
# install packages to allow apt to use a repository over HTTPS
sudo apt-get install apt-transport-https ca-certificates curl gnupg lsb-release

# add Docker’s official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

# set up the stable repository
echo \
"deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# update the apt package index
sudo apt-get update
# install the latest version of Docker Engine and containerd
sudo apt-get install docker-ce docker-ce-cli containerd.io

之后,将 Docker 设为 non-root 用户可用:

sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker

参考

安装 Nvidia Docker

# setting up Docker
curl https://get.docker.com | sh \
&& sudo systemctl --now enable docker

# add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker

使用

docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
$ docker run --rm --gpus all nvidia/cuda:11.2.2-base-ubuntu18.04 nvidia-smi
Thu Mar 25 06:49:21 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67 Driver Version: 460.67 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 2060 Off | 00000000:01:00.0 Off | N/A |
| N/A 52C P0 26W / N/A | 301MiB / 5934MiB | 30% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+

配置

可配置 nvidia 为默认运行时,使用 docker-compose 时需要:

$ sudo mkdir -p /etc/docker
$ sudo tee /etc/docker/daemon.json <<-EOF
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
EOF

$ sudo systemctl daemon-reload
$ sudo systemctl restart docker

参考