文章/答案/技术大牛

发布

社区首页 >问答首页 >神经网络测试不起作用CUBLAS_STATUS_EXECUTION_FAILED

问神经网络测试不起作用CUBLAS_STATUS_EXECUTION_FAILED
EN

Stack Overflow用户

提问于 2022-05-16 13:39:31

回答 1查看 89关注 0票数 0

我刚开始用Fran ois Chollet的“用python进行深度学习”( Python )一书学习Python和深度学习，但是测试示例(pg.27)陷入了第一个时代。

它在Windows上，在创建的Anaconda环境中，它只安装了jupyter、spyder、tensorflow gpu和keras gpu，而Python版本为3.5.6，因为上面的3.6版本不起作用。

怎么啦？

    from keras.datasets import mnist
    (train_images, train_labels), (test_images, test_labels) = mnist.load_data()

    from keras import models
    from keras import layers
    network = models.Sequential()
    network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
    network.add(layers.Dense(10, activation='softmax'))

    network.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

    train_images = train_images.reshape((60000, 28 * 28))
    train_images = train_images.astype('float32') / 255
    test_images = test_images.reshape((10000, 28 * 28))
    test_images = test_images.astype('float32') / 255

    from keras.utils import to_categorical
    train_labels = to_categorical(train_labels)
    test_labels = to_categorical(test_labels)

    network.fit(train_images, train_labels, epochs=5, batch_size=128)

它被困在这里面：

Epoch 1/5

误差

InternalError                             Traceback (most recent call last)
<ipython-input-1-63880145b61b> in <module>()
     19 test_labels = to_categorical(test_labels)
     20 
---> 21 network.fit(train_images, train_labels, epochs=5, batch_size=128)

C:\Users\\.conda\envs\tf-gpu-ide\lib\site-packages\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
   1035                                         initial_epoch=initial_epoch,
   1036                                         steps_per_epoch=steps_per_epoch,
-> 1037                                         validation_steps=validation_steps)
   1038 
   1039     def evaluate(self, x=None, y=None,

C:\Users\\.conda\envs\tf-gpu-ide\lib\site-packages\keras\engine\training_arrays.py in fit_loop(model, f, ins, out_labels, batch_size, epochs, verbose, callbacks, val_f, val_ins, shuffle, callback_metrics, initial_epoch, steps_per_epoch, validation_steps)
    197                     ins_batch[i] = ins_batch[i].toarray()
    198 
--> 199                 outs = f(ins_batch)
    200                 outs = to_list(outs)
    201                 for l, o in zip(out_labels, outs):

C:\Users\\.conda\envs\tf-gpu-ide\lib\site-packages\keras\backend\tensorflow_backend.py in __call__(self, inputs)
   2664                 return self._legacy_call(inputs)
   2665 
-> 2666             return self._call(inputs)
   2667         else:
   2668             if py_any(is_tensor(x) for x in inputs):

C:\Users\\.conda\envs\tf-gpu-ide\lib\site-packages\keras\backend\tensorflow_backend.py in _call(self, inputs)
   2634                                 symbol_vals,
   2635                                 session)
-> 2636         fetched = self._callable_fn(*array_vals)
   2637         return fetched[:len(self.outputs)]
   2638 

C:\Users\\.conda\envs\tf-gpu-ide\lib\site-packages\tensorflow\python\client\session.py in __call__(self, *args, **kwargs)
   1380           ret = tf_session.TF_SessionRunCallable(
   1381               self._session._session, self._handle, args, status,
-> 1382               run_metadata_ptr)
   1383         if run_metadata:
   1384           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

C:\Users\\.conda\envs\tf-gpu-ide\lib\site-packages\tensorflow\python\framework\errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    517             None, None,
    518             compat.as_text(c_api.TF_Message(self.status.status)),
--> 519             c_api.TF_GetCode(self.status.status))
    520     # Delete the underlying status object from memory otherwise it stays alive
    521     # as there is a reference to status from this from the traceback due to

InternalError: Blas GEMM launch failed : a.shape=(128, 512), b.shape=(512, 10), m=128, n=10, k=512
     [[Node: dense_2/MatMul = MatMul[T=DT_FLOAT, _class=["loc:@training/RMSprop/gradients/dense_2/MatMul_grad/MatMul_1"], transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](dense_1/Relu, dense_2/kernel/read)]]
     [[Node: metrics/acc/Mean/_53 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_362_metrics/acc/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

jupyter笔记本日志：

2022-05-16 09:00:15.130789: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2022-05-16 09:00:15.256866: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: NVIDIA GeForce RTX 3060 Ti major: 8 minor: 6 memoryClockRate(GHz): 1.665
pciBusID: 0000:07:00.0
totalMemory: 8.00GiB freeMemory: 6.99GiB
2022-05-16 09:00:15.256962: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2022-05-16 09:00:15.802752: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-05-16 09:00:15.802829: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0
2022-05-16 09:00:15.803036: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N
2022-05-16 09:00:15.803133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6724 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Ti, pci bus id: 0000:07:00.0, compute capability: 8.6)
2022-05-16 09:00:16.362403: E tensorflow/stream_executor/cuda/cuda_blas.cc:647] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED

如果它是相关的，有关配置的其他信息：

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_19:00:59_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0


conda list keras
# Name                    Version                   Build  Channel
keras-applications        1.0.4                    py35_1
keras-base                2.2.2                    py35_0
keras-gpu                 2.2.2                         0
keras-preprocessing       1.0.2                    py35_1

一般环境

numba -s

__Time Stamp__
Report started (local time)                   : 2022-05-16 09:11:51.541010
UTC start time                                : 2022-05-16 12:11:51.541010
Running time (s)                              : 4.864708

__Hardware Information__
Machine                                       : AMD64
CPU Name                                      : generic
CPU Count                                     : 12
Number of accessible CPUs                     : 12
List of accessible CPUs cores                 : 0 1 2 3 4 5 6 7 8 9 10 11
CFS Restrictions (CPUs worth of runtime)      : None

Memory Total (MB)                             : 16295
Memory Available (MB)                         : 7940

__OS Information__
Platform Name                                 : Windows-10-10.0.19043-SP0
Platform Release                              : 10
OS Name                                       : Windows
OS Version                                    : 10.0.19043
OS Specific Version                           : 10 10.0.19043 SP0 Multiprocessor Free
Libc Version                                  : ?

__Python Information__
Python Compiler                               : MSC v.1916 64 bit (AMD64)
Python Implementation                         : CPython
Python Version                                : 3.9.7

__Numba Toolchain Versions__
Numba Version                                 : 0.55.1
llvmlite Version                              : 0.38.0

__LLVM Information__
LLVM Version                                  : 11.1.0

__CUDA Information__
CUDA Device Initialized                       : True
CUDA Driver Version                           : (11, 7)
CUDA Runtime Version                          : 11070
CUDA NVIDIA Bindings Available                : False
CUDA NVIDIA Bindings In Use                   : False
CUDA Detect Output:
Found 1 CUDA devices
id 0    b'NVIDIA GeForce RTX 3060 Ti'                              [SUPPORTED]
                      Compute Capability: 8.6
                           PCI Device ID: 0
                              PCI Bus ID: 7
                                    UUID: GPU-ac925103-25ce-7d00-18b3-0a068631b05d
                                Watchdog: Enabled
                            Compute Mode: WDDM
             FP32/FP64 Performance Ratio: 32
Summary:
        1/1 devices are supported

CUDA Libraries Test Output:
Finding nvvm from CUDA_HOME
        named  nvvm64_40_0.dll
        trying to open library...       ok
Finding cudart from CUDA_HOME
        named  cudart64_110.dll
        trying to open library...       ok
Finding cudadevrt from CUDA_HOME
        named  cudadevrt.lib
        ERROR: failed to find cudadevrt:
cudadevrt.lib not found
Finding libdevice from CUDA_HOME
        searching for compute_20...     ok
        searching for compute_30...     ok
        searching for compute_35...     ok
        searching for compute_50...     ok


__SVML Information__
SVML State, config.USING_SVML                 : True
SVML Library Loaded                           : True
llvmlite Using SVML Patched LLVM              : True
SVML Operational                              : True

__Threading Layer Information__
TBB Threading Layer Available                 : True
+-->TBB imported successfully.
OpenMP Threading Layer Available              : True
+-->Vendor: MS
Workqueue Threading Layer Available           : True
+-->Workqueue imported successfully.

__Numba Environment Variable Information__
None found.

__Conda Information__
Conda Build                                   : 3.21.8
Conda Env                                     : 4.12.0
Conda Platform                                : win-64
Conda Python Version                          : 3.9.7.final.0
Conda Root Writable                           : False

python

tensorflow

keras

回答 1

Stack Overflow用户

发布于 2022-05-27 09:09:17

此错误是由于版本不兼容造成的。根据官方文档，Tensorflow支持CUDA 11.2，但您使用的是CUDA 11.7。请卸载CUDA 11.7并从这里安装CUDA 10.1，因为它与Tensorflow 2.2兼容。要设置GPU，请按照这里的说明进行操作。有关更多细节，请参考经过测试的构建配置。谢谢!

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/72260158

复制

相似问题

问神经网络测试不起作用CUBLAS_STATUS_EXECUTION_FAILED
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问神经网络测试不起作用CUBLAS_STATUS_EXECUTION_FAILEDEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问神经网络测试不起作用CUBLAS_STATUS_EXECUTION_FAILED
EN