CVE-2025-23359 漏洞修复说明

最近更新时间:2025-03-27 19:50:53

我的收藏

背景

NVIDIA Container Toolkit 库 libnvidia-container 在处理 CUDA 前向兼容特性时,会将容器 /usr/local/cuda/compat 目录下的文件挂载到容器 lib(/usr/lib/x86_64-linux-gnu/ 等)目录,该挂载行为会受到软链接攻击影响,可导致任意主机目录被以只读模式挂载到容器内,从而引发容器逃逸。详情可参考 社区描述
目前,NVIDIA 官方已针对该漏洞推出了 nvidia-container-toolkit 1.17.4容器工具包进行修复。
腾讯云 TKE 在2025年3月21日之前创建的 GPU 节点,都存在受到该漏洞攻击的隐患。推荐您及时升级 GPU 节点上的 nvidia-container-toolkit 工具包,避免您的业务安全受到损害。

修复指引

1. 下载 nvidia-container-toolkit 升级工具。
wget https://blake-gz-1251707795.cos.ap-guangzhou.myqcloud.com/nv-runtime-upgrade-v2.tar.gz
2. 解压工具压缩包。
tar -zxvf nv-runtime-upgrade-v2.tar.gz
3. 执行升级脚本。
Ubuntu 系列镜像:
root@VM-4-8-ubuntu:~/nv-runtime-upgrade# ./upgrade-nv-runtime.sh
2025-02-21/16:05:11 INFO Need to upgrade nvidia-container-toolkit(1.14.5-1) to 1.17.4
(Reading database ... 137641 files and directories currently installed.)
Removing libnvidia-container-tools (1.14.5-1) ...
Removing libnvidia-container1:amd64 (1.14.5-1) ...
Processing triggers for libc-bin (2.31-0ubuntu9.16) ...
2025-02-21/16:05:32 INFO start to install nvidia container toolkit
2025-02-21/16:05:33 INFO succeed to upgrade nvidia-container-toolkit
CentOS/TencentOS 系列镜像:
[root@VM-5-10-tlinux nv-runtime-upgrade]# ./upgrade-nv-runtime.sh
2025-02-21/16:02:44 INFO Need to upgrade nvidia-container-toolkit(1.14.5) to 1.17.4
2025-02-21/16:02:44 INFO This node has been installed qgpu
2025-02-21/16:02:44 INFO current qgpu version is 2.2.0
2025-02-21/16:02:44 INFO backup up qgpu tools
2025-02-21/16:02:44 INFO success backup qgpu
No packages marked for removal.
2025-02-21/16:02:47 INFO start to install nvidia container toolkit
2025-02-21/16:02:48 INFO succeed to upgrade nvidia-container-toolkit
2025-02-21/16:02:48 INFO recover qgpu
2025-02-21/16:02:48 INFO success to recover qgpu
2025-02-21/16:02:48 INFO succeed to upgrade nvidia container toolkit
4. 检查升级是否成功。
Ubuntu 系列镜像:
root@VM-4-8-ubuntu:~/nv-runtime-upgrade# dpkg -l |grep nvidia
ii libnvidia-container-tools 1.17.4-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.17.4-1 amd64 NVIDIA container runtime library
ii nvidia-container-runtime 3.14.0-1 all NVIDIA Container Toolkit meta-package
ii nvidia-container-toolkit 1.17.4-1 amd64 NVIDIA Container toolkit
ii nvidia-container-toolkit-base 1.17.4-1 amd64 NVIDIA Container Toolkit Base
ii nvidia-docker2 2.14.0-1 all NVIDIA Container Toolkit meta-package
Centos/TencentOS 系列镜像:
[root@VM-5-10-tlinux ~]# rpm -qa|grep nvidia
libnvidia-container1-1.17.4-1.x86_64
nvidia-container-toolkit-1.17.4-1.x86_64
nvidia-container-toolkit-base-1.17.4-1.x86_64
libnvidia-container-tools-1.17.4-1.x86_64
nvidia-docker2-2.14.0-1.noarch
nvidia-container-runtime-3.14.0-1.noarch

回滚指引

若需要回滚上述操作,可参考以下命令:
if which yum &>/dev/null; then
yum remove -y nvidia-container-runtime >/dev/null || true
yum remove -y nvidia-container-toolkit >/dev/null || true
yum remove -y nvidia-container-toolkit-base >/dev/null || true
# 卸载 libnvidia
yum remove -y libnvidia-container1 >/dev/null|| true
elif which dpkg &>/dev/null; then
apt-get --purge remove -y nvidia-container-runtime >/dev/null || true
apt-get --purge remove -y nvidia-container-toolkit >/dev/null || true
# 老版本可能没有这个,但是不影响
apt-get --purge remove -y nvidia-container-toolkit-base >/dev/null || true
# 删除libnvidia
apt-get --purge remove -y libnvidia-container1 >/dev/null || true
fi
/usr/local/qcloud/gpu/install.sh
if [ -f /proc/qgpu/version ]; then
echo "recover qgpu"
if [ -d /usr/bin/qgpu_backup ]; then
mv /usr/bin/nvidia-container-toolkit /usr/bin/nvidia-container-toolkit.backup
mv /usr/bin/nvidia-container-runtime-hook /usr/bin/nvidia-container-runtime-hook.backup
cp /usr/bin/qgpu_backup/* /usr/bin/
echo "success to recover qgpu"
else
echo "failed to find qgpu backup dir, please restart qgpu-manager to recover qgpu"
fi
fi