在从18.04升级到Ubuntu20.04之后,Tensorflow不再能够使用我的gpu,因为它试图混合和加载不同的版本(有些是10,有些是11)。它是一台System76机器,我已经从System76安装了CUDA10.1(因此它可以与System76 nvidia驱动程序一起工作)。运行tensorflow时出现以下错误:
2021-01-07 18:12:22.584886: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-01-07 18:12:22.584906: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-01-07 18:12:23.640665: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-07 18:12:23.641412: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-01-07 18:12:23.669966: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-07 18:12:23.670257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1060 computeCapability: 6.1
coreClock: 1.733GHz coreCount: 10 deviceMemorySize: 5.93GiB deviceMemoryBandwidth: 178.99GiB/s
2021-01-07 18:12:23.670328: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-01-07 18:12:23.670379: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2021-01-07 18:12:23.670425: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2021-01-07 18:12:23.671387: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-01-07 18:12:23.671667: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-01-07 18:12:23.673022: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-01-07 18:12:23.673100: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-01-07 18:12:23.673245: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-01-07 18:12:23.673259: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU.请注意,所有警告都是针对尝试加载Cuda版本11的,但它只针对某些库。版本10的加载很好。
这是nvcc的输出--版本
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105这是nvidia-smi的输出
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38 Driver Version: 455.38 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1060 Off | 00000000:01:00.0 Off | N/A |
| N/A 53C P0 26W / N/A | 585MiB / 6069MiB | 4% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2999 G /usr/lib/xorg/Xorg 101MiB |
| 0 N/A N/A 3479 G /usr/lib/xorg/Xorg 255MiB |
| 0 N/A N/A 3720 G /usr/bin/gnome-shell 88MiB |
| 0 N/A N/A 6487 G ...AAAAAAAA== --shared-files 45MiB |
| 0 N/A N/A 6959 G ...AAAAAAAA== --shared-files 40MiB |
| 0 N/A N/A 11642 G ...AAAAAAAA== --shared-files 21MiB |
| 0 N/A N/A 25206 G WickrMe 17MiB |
+-----------------------------------------------------------------------------+我看到nvidia-smi输出中的驱动程序版本是版本11,但据我所知,这与cuda运行时无关。这只是驱动程序支持的最高版本。如果我错了,请纠正我。
我必须使用版本10,因为这是System76支持的版本,而且在升级之前它工作得很好。我也尝试过通过pip3卸载并重新安装Tensorflow,但没有成功。
有人知道如何让所有的库同步到10.1版本吗?我还尝试手动放置版本11的库,并让Tensorflow使用混合版本(这当然不是一个好主意),但它无法识别它们(或者我没有正确放置它们)。
发布于 2021-01-09 00:03:44
正如@talonmies指出的那样,我误解了版本控制系统。然而,因为它是一台System76机器,它也是令人困惑的,因为System76使用他们自己的Nvidia驱动程序,并且安装CUDA11和Cudnn并不简单。我把答案贴出来,以防其他人遇到System76的问题。
首先,不要使用Cuda和Cudnn的System76安装。他们有自己的版本(在他们的网站上),以便与他们的Nvidia驱动程序兼容,但他们不能工作(他们是版本10,而TF 2.2+需要11)。此外,大多数通用Cuda指南都会告诉您先卸载/安装Nvida驱动程序,以便进行干净的安装,但如果您使用的是System76系统,请不要这样做。不要理会System76驱动程序。此外,如果您有任何以前的Cuda/Cudnn删除/卸载所有它。
去Nvidia获取他们最新的Cuda和Cudnn。我用过
wget http://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda_11.0.2_450.51.05_linux.run用下面的命令运行
sudo sh cuda_11.0.2_450.51.05_linux.run当它运行时,它会告诉你你与驱动程序包有冲突。忽略它,然后继续。当您到达安装菜单时,取消选中“安装驱动程序”并继续安装。完成后,添加到您的路径中
/usr/local/cuda-11.0:/usr/local/cuda-11.0/bin:您需要同时添加cuda root和bin,而不仅仅是bin (这与大多数通用指令不同)。源码你的.bashrc或.profile,或者你添加路径的任何地方(或者打开一个新的终端)。
现在安装Cudnn。
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/libcudnn8_8.0.5.39-1+cuda11.0_amd64.deb使用dpkg安装它。例如(在我的例子中)...
sudo dpkg -i libcudnn8_8.0.5.39-1+cuda11.0_amd64.deb就这样。一旦我完成了所有这些,一切都运行得很好。希望这能帮助一些System76用户更容易地通过Ununtu20.04和Cuda11。
发布于 2021-02-24 15:35:03
非常感谢。我使用POP OS的原因之一是Nvidia驱动程序+cuda/cudnn刚刚与tensorflow一起工作,直到这个问题与11.0版丢失。
使用上面的方法安装cuda 11.0时,我需要做的一件事就是安装gcc版本8:
sudo apt -y install gcc-8 g++-8
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 8
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 8我真的希望POP!_os能直接提供CUDA11.0包.
https://stackoverflow.com/questions/65621855
复制相似问题