我采用的步骤是手动安装 bin 文件,毕竟服务器我基本上不升级维护,秉持着只要开始瞎搞,就一定会崩的原则,我已经放弃瞎搞了((
下载 bin 文件的过程就不说了,网上一大堆,就在 nv 官网。
在 pve 宿主机安装 bin 文件,直接运行就可以了。
chmod +x NVIDIA-Linux-x86_64-550.142.run
|
./NVIDIA-Linux-x86_64-550.142.run
|
之后重启服务器,检查一下驱动的工作情况。
root@pve:~# lspci -v | grep -i nv 06:00.0 Non-Volatile memory controller: Intel Corporation NVMe Optane Memory Series (prog-if 02 [NVM Express]) Kernel driver in use: nvme Kernel modules: nvme 07:00.0 Non-Volatile memory controller: Intel Corporation NVMe Optane Memory Series (prog-if 02 [NVM Express]) Kernel driver in use: nvme Kernel modules: nvme 81:00.0 VGA compatible controller: NVIDIA Corporation TU116 [GeForce GTX 1660 SUPER] (rev a1) (prog-if 00 [VGA controller]) Kernel driver in use: nvidia Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia 81:00.1 Audio device: NVIDIA Corporation TU116 High Definition Audio Controller (rev a1) 81:00.2 USB controller: NVIDIA Corporation TU116 USB 3.1 Host Controller (rev a1) (prog-if 30 [XHCI]) 81:00.3 Serial bus controller: NVIDIA Corporation TU116 USB Type-C UCSI Controller (rev a1) Kernel driver in use: nvidia-gpu Kernel modules: i2c_nvidia_gpu
|
root@pve:~# ls -l /dev/dri/ total 0 drwxr-xr-x 2 root root 80 Jan 5 12:40 by-path crw-rw---- 1 root video 226, 0 Jan 5 12:40 card0 crw-rw---- 1 root render 226, 128 Jan 5 12:40 renderD128
|
root@pve:~# ls -l /dev/nvidia* crw-rw-rw- 1 root root 195, 0 Jan 1 18:18 /dev/nvidia0 crw-rw-rw- 1 root root 195, 255 Jan 1 18:18 /dev/nvidiactl crw-rw-rw- 1 root root 195, 254 Jan 1 18:18 /dev/nvidia-modeset crw-rw-rw- 1 root root 508, 0 Jan 1 18:18 /dev/nvidia-uvm crw-rw-rw- 1 root root 508, 1 Jan 1 18:18 /dev/nvidia-uvm-tools
/dev/nvidia-caps: total 0 cr-------- 1 root root 511, 1 Jan 1 18:18 nvidia-cap1 cr--r--r-- 1 root root 511, 2 Jan 1 18:18 nvidia-cap2
|
宿主机工作正常以后,创建或者编辑现有的 lxc 配置文件。将这些字段添加进去。
lxc.mount.auto: "proc:rw sys:rw" lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-caps/nvidia-cap1 none bind,optional,create=file lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-caps/nvidia-cap2 none bind,optional,create=file lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file lxc.apparmor.profile: unconfined lxc.cap.drop: lxc.cgroup2.devices.allow: a lxc.cgroup2.devices.allow: c 195:* rwm lxc.cgroup2.devices.allow: c 226:0 rwm lxc.cgroup2.devices.allow: c 508:* rwm lxc.cgroup2.devices.allow: c 511:0 rwm
|
这几行配置文件就是将设备文件挂载进 lxc 中,并且允许权限。需要注意的是,pve8 已经在使用 cgroup2 了,网上有些教程仍然使用的 lxc.cgroup.devices
, 而不是 lxc.cgroup2.devices
,在看别人的资料时需要严格留意。
之后启动 lxc,安装同样的 bin 文件,但是需要添加一个参数,因为 lxc 和宿主机共享内核,我们不需要在 lxc 中安装 dkms。
./NVIDIA-Linux-x86_64-550.142.run --no-kernel-module
|
安装完成以后重启 lxc 容器,就可以执行 nvidia-smi
测试了。
root@k8s:~# nvidia-smi Sun Jan 5 12:50:18 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.142 Driver Version: 550.142 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce GTX 1660 ... Off | 00000000:81:00.0 Off | N/A | | 0% 42C P0 N/A / 125W | 1MiB / 6144MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+
|