我在Ubuntu 18.04.3上安装了Tensorflow2.1,Cuda10.1,cudnn7.6.5.32,Nvidia驱动程序430.5。
我无法按照tensorflow网站上的说明进行操作,因为许多部件都无法正常工作,但经过几个小时后,我终于安装了所有组件。当我尝试运行一个20行的mnist示例时,我得到了:
2020-02-19 03:02:24.915143: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.683GHz coreCount: 28 deviceMemorySize: 10.91GiB deviceMemoryBandwidth: 451.17GiB/s
2020-02-19 03:02:24.915194: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-19 03:02:24.915216: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-19 03:02:24.915234: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-19 03:02:24.915253: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-19 03:02:24.915271: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-19 03:02:24.915289: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-19 03:02:24.915308: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-19 03:02:24.917997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-02-19 03:02:24.918060: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-19 03:02:24.920974: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-19 03:02:24.921000: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-02-19 03:02:24.921013: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2020-02-19 03:02:24.924091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10258 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
Train on 60000 samples
Epoch 1/5
2020-02-19 03:02:26.155747: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-19 03:02:26.156063: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-02-19 03:02:26.156110: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-02-19 03:02:26.156225: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-02-19 03:02:26.156253: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-02-19 03:02:26.156483: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-02-19 03:02:26.158110: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-02-19 03:02:26.158133: W tensorflow/stream_executor/stream.cc:2041] attempting to perform BLAS operation using StreamExecutor without BLAS support
2020-02-19 03:02:26.158158: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Internal: Blas GEMM launch failed : a.shape=(32, 784), b.shape=(784, 128), m=32, n=128, k=784
[[{{node sequential/dense/MatMul}}]]
我知道这个错误可能意味着安装中存在错误,但我如何确定是什么错误?有没有办法确定使用的是哪个版本的cudnn?
我在谷歌上搜索了很多,有很多人都有同样的问题,但没有解决方案。
发布于 2020-02-20 09:03:09
我花了两天的时间试着让这些垃圾发挥作用。到最后,我永远不会知道它为什么最终开始工作。我现在知道我所做的一切在原则上都是正确的,但是,尽管安装显然是有效的,但对于CUBLAS_STATUS_NOT_INITIALIZED,一个简单的mnist示例失败了。
当我最终让它工作时,我:
在我started.
sudo apt-get install cuda-10-1
而不是:
sudo apt-get install cuda
这确保了在编写本文时,cuda不会自动升级到最新版本(10.2)。
https://stackoverflow.com/questions/60292693
复制相似问题