简介: Linux Enterprise Server 15 SP3配置安装kvm nvidia vGPU
更详细信息可以参考一下链接
然后启用IOMMU,添加到grub上
cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-default [...] intel_iommu=on [...]
假如没有上述iommu字段,那就要添加/etc/default/grub
GRUB_CMDLINE_LINUX="intel_iommu=on"
GRUB_CMDLINE_LINUX="amd_iommu=on"
生成一个新的grub文件
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
然后禁用 nouveau kernel module
echo "blacklist nouveau" > /etc/modprobe.d/50-blacklist.conf
重启
在安装依赖包前,建议先安装国内源,且把旧的源删除
清理缓存
安装 kernel-default-devel
zypper update
zypper install kernel-default-devel dkms
zypper install -t pattern devel_C_C++ devel_kernely
切换init模式
安装vGPU驱动,这个驱动跟常规的驱动不一样,跟厂商那边获取
chmod +x NVIDIA-Linux-x86_64-470.82-vgpu-kvm.run
./NVIDIA-Linux-x86_64-470.82-vgpu-kvm.run --dkms
查看是否安装成功,假如能正确显示显卡信息,则代表安装成功
localhost:~
Tue Dec 7 16:14:42 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82 Driver Version: 470.82 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A5000 On | 00000000:18:00.0 Off | Off |
| 52% 78C P0 203W / 230W | 23936MiB / 24258MiB | 89% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA RTX A5000 On | 00000000:3B:00.0 Off | Off |
| 54% 78C P0 199W / 230W | 23936MiB / 24258MiB | 74% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 14174 C+G vgpu 23936MiB |
| 1 N/A N/A 14174 C+G vgpu 23936MiB |
+-----------------------------------------------------------------------------+
还需要查看nvidia module
localhost:~
nvidia_vgpu_vfio 69632 36
nvidia 35364864 3722
mdev 28672 2 vfio_mdev,nvidia_vgpu_vfio
vfio 40960 8 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1
drm 614400 7 drm_kms_helper,drm_vram_helper,ast,nvidia,drm_ttm_helper,ttm
nvidia vGPU有两种模式:
首先A5000不支持MIG模式,以下输出信息可知
localhost:~
Unable to enable MIG Mode for GPU 00000000:18:00.0: Not Supported
Treating as warning and moving on.
All done.
所以我们需要使用的profile的配置方式
在刚安装完驱动后,先重启机器,重启后,启用显卡设备
sudo /usr/lib/nvidia/sriov-manage -e 00:18:0000.0
sudo /usr/lib/nvidia/sriov-manage -e 00:3b:0000.0
cd /sys/bus/pci/devices/0000:18:00.0/virtfn0/mdev_supported_types
for i in *; do echo "" $(cat $i/name) available: $(cat $i/avail*); done
NVIDIA RTXA5000-1B available: 0
NVIDIA RTXA5000-2B available: 0
NVIDIA RTXA5000-1Q available: 0
NVIDIA RTXA5000-2Q available: 0
NVIDIA RTXA5000-3Q available: 0
NVIDIA RTXA5000-4Q available: 0
NVIDIA RTXA5000-6Q available: 0
NVIDIA RTXA5000-8Q available: 0
NVIDIA RTXA5000-12Q available: 0
NVIDIA RTXA5000-24Q available: 0
NVIDIA RTXA5000-1A available: 0
NVIDIA RTXA5000-2A available: 0
NVIDIA RTXA5000-3A available: 0
NVIDIA RTXA5000-4A available: 0
NVIDIA RTXA5000-6A available: 0
NVIDIA RTXA5000-8A available: 0
NVIDIA RTXA5000-12A available: 0
NVIDIA RTXA5000-24A available: 0
NVIDIA RTXA5000-4C available: 0
NVIDIA RTXA5000-6C available: 0
NVIDIA RTXA5000-8C available: 0
NVIDIA RTXA5000-12C available: 0
NVIDIA RTXA5000-24C available: 0
uuidgen
f715f63c-0d00-4007-9c5a-b07b0c6c05de
sudo echo "f715f63c-0d00-4007-9c5a-b07b0c6c05de" > nvidia-666/create
sudo dmesg | tail
[...]
[ 3218.491843] vfio_mdev f715f63c-0d00-4007-9c5a-b07b0c6c05de: Adding to iommu group 322
[ 3218.499700] vfio_mdev f715f63c-0d00-4007-9c5a-b07b0c6c05de: MDEV: group_id = 322
[ 3599.608540] vfio_mdev f715f63c-0d00-4007-9c5a-b07b0c6c05de: Removing from iommu group 322
[ 3599.616753] vfio_mdev f715f63c-0d00-4007-9c5a-b07b0c6c05de: MDEV: detaching iommu
[ 3626.345530] vfio_mdev f715f63c-0d00-4007-9c5a-b07b0c6c05de: Adding to iommu group 322
[ 3626.353383] vfio_mdev f715f63c-0d00-4007-9c5a-b07b0c6c05de: MDEV: group_id = 322