首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >神经网络测试不起作用CUBLAS_STATUS_EXECUTION_FAILED

神经网络测试不起作用CUBLAS_STATUS_EXECUTION_FAILED
EN

Stack Overflow用户
提问于 2022-05-16 13:39:31
回答 1查看 89关注 0票数 0

我刚开始用Fran ois Chollet的“用python进行深度学习”( Python )一书学习Python和深度学习,但是测试示例(pg.27)陷入了第一个时代。

它在Windows上,在创建的Anaconda环境中,它只安装了jupyter、spyder、tensorflow gpu和keras gpu,而Python版本为3.5.6,因为上面的3.6版本不起作用。

怎么啦?

代码语言:javascript
运行
复制
    from keras.datasets import mnist
    (train_images, train_labels), (test_images, test_labels) = mnist.load_data()

    from keras import models
    from keras import layers
    network = models.Sequential()
    network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
    network.add(layers.Dense(10, activation='softmax'))

    network.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

    train_images = train_images.reshape((60000, 28 * 28))
    train_images = train_images.astype('float32') / 255
    test_images = test_images.reshape((10000, 28 * 28))
    test_images = test_images.astype('float32') / 255

    from keras.utils import to_categorical
    train_labels = to_categorical(train_labels)
    test_labels = to_categorical(test_labels)

    network.fit(train_images, train_labels, epochs=5, batch_size=128)

它被困在这里面:

代码语言:javascript
运行
复制
Epoch 1/5

误差

代码语言:javascript
运行
复制
InternalError                             Traceback (most recent call last)
<ipython-input-1-63880145b61b> in <module>()
     19 test_labels = to_categorical(test_labels)
     20 
---> 21 network.fit(train_images, train_labels, epochs=5, batch_size=128)

C:\Users\\.conda\envs\tf-gpu-ide\lib\site-packages\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
   1035                                         initial_epoch=initial_epoch,
   1036                                         steps_per_epoch=steps_per_epoch,
-> 1037                                         validation_steps=validation_steps)
   1038 
   1039     def evaluate(self, x=None, y=None,

C:\Users\\.conda\envs\tf-gpu-ide\lib\site-packages\keras\engine\training_arrays.py in fit_loop(model, f, ins, out_labels, batch_size, epochs, verbose, callbacks, val_f, val_ins, shuffle, callback_metrics, initial_epoch, steps_per_epoch, validation_steps)
    197                     ins_batch[i] = ins_batch[i].toarray()
    198 
--> 199                 outs = f(ins_batch)
    200                 outs = to_list(outs)
    201                 for l, o in zip(out_labels, outs):

C:\Users\\.conda\envs\tf-gpu-ide\lib\site-packages\keras\backend\tensorflow_backend.py in __call__(self, inputs)
   2664                 return self._legacy_call(inputs)
   2665 
-> 2666             return self._call(inputs)
   2667         else:
   2668             if py_any(is_tensor(x) for x in inputs):

C:\Users\\.conda\envs\tf-gpu-ide\lib\site-packages\keras\backend\tensorflow_backend.py in _call(self, inputs)
   2634                                 symbol_vals,
   2635                                 session)
-> 2636         fetched = self._callable_fn(*array_vals)
   2637         return fetched[:len(self.outputs)]
   2638 

C:\Users\\.conda\envs\tf-gpu-ide\lib\site-packages\tensorflow\python\client\session.py in __call__(self, *args, **kwargs)
   1380           ret = tf_session.TF_SessionRunCallable(
   1381               self._session._session, self._handle, args, status,
-> 1382               run_metadata_ptr)
   1383         if run_metadata:
   1384           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

C:\Users\\.conda\envs\tf-gpu-ide\lib\site-packages\tensorflow\python\framework\errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    517             None, None,
    518             compat.as_text(c_api.TF_Message(self.status.status)),
--> 519             c_api.TF_GetCode(self.status.status))
    520     # Delete the underlying status object from memory otherwise it stays alive
    521     # as there is a reference to status from this from the traceback due to

InternalError: Blas GEMM launch failed : a.shape=(128, 512), b.shape=(512, 10), m=128, n=10, k=512
     [[Node: dense_2/MatMul = MatMul[T=DT_FLOAT, _class=["loc:@training/RMSprop/gradients/dense_2/MatMul_grad/MatMul_1"], transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](dense_1/Relu, dense_2/kernel/read)]]
     [[Node: metrics/acc/Mean/_53 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_362_metrics/acc/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

jupyter笔记本日志:

代码语言:javascript
运行
复制
2022-05-16 09:00:15.130789: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2022-05-16 09:00:15.256866: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: NVIDIA GeForce RTX 3060 Ti major: 8 minor: 6 memoryClockRate(GHz): 1.665
pciBusID: 0000:07:00.0
totalMemory: 8.00GiB freeMemory: 6.99GiB
2022-05-16 09:00:15.256962: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2022-05-16 09:00:15.802752: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-05-16 09:00:15.802829: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0
2022-05-16 09:00:15.803036: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N
2022-05-16 09:00:15.803133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6724 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Ti, pci bus id: 0000:07:00.0, compute capability: 8.6)
2022-05-16 09:00:16.362403: E tensorflow/stream_executor/cuda/cuda_blas.cc:647] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED

如果它是相关的,有关配置的其他信息:

代码语言:javascript
运行
复制
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_19:00:59_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0


conda list keras
# Name                    Version                   Build  Channel
keras-applications        1.0.4                    py35_1
keras-base                2.2.2                    py35_0
keras-gpu                 2.2.2                         0
keras-preprocessing       1.0.2                    py35_1

一般环境

代码语言:javascript
运行
复制
numba -s

__Time Stamp__
Report started (local time)                   : 2022-05-16 09:11:51.541010
UTC start time                                : 2022-05-16 12:11:51.541010
Running time (s)                              : 4.864708

__Hardware Information__
Machine                                       : AMD64
CPU Name                                      : generic
CPU Count                                     : 12
Number of accessible CPUs                     : 12
List of accessible CPUs cores                 : 0 1 2 3 4 5 6 7 8 9 10 11
CFS Restrictions (CPUs worth of runtime)      : None

Memory Total (MB)                             : 16295
Memory Available (MB)                         : 7940

__OS Information__
Platform Name                                 : Windows-10-10.0.19043-SP0
Platform Release                              : 10
OS Name                                       : Windows
OS Version                                    : 10.0.19043
OS Specific Version                           : 10 10.0.19043 SP0 Multiprocessor Free
Libc Version                                  : ?

__Python Information__
Python Compiler                               : MSC v.1916 64 bit (AMD64)
Python Implementation                         : CPython
Python Version                                : 3.9.7

__Numba Toolchain Versions__
Numba Version                                 : 0.55.1
llvmlite Version                              : 0.38.0

__LLVM Information__
LLVM Version                                  : 11.1.0

__CUDA Information__
CUDA Device Initialized                       : True
CUDA Driver Version                           : (11, 7)
CUDA Runtime Version                          : 11070
CUDA NVIDIA Bindings Available                : False
CUDA NVIDIA Bindings In Use                   : False
CUDA Detect Output:
Found 1 CUDA devices
id 0    b'NVIDIA GeForce RTX 3060 Ti'                              [SUPPORTED]
                      Compute Capability: 8.6
                           PCI Device ID: 0
                              PCI Bus ID: 7
                                    UUID: GPU-ac925103-25ce-7d00-18b3-0a068631b05d
                                Watchdog: Enabled
                            Compute Mode: WDDM
             FP32/FP64 Performance Ratio: 32
Summary:
        1/1 devices are supported

CUDA Libraries Test Output:
Finding nvvm from CUDA_HOME
        named  nvvm64_40_0.dll
        trying to open library...       ok
Finding cudart from CUDA_HOME
        named  cudart64_110.dll
        trying to open library...       ok
Finding cudadevrt from CUDA_HOME
        named  cudadevrt.lib
        ERROR: failed to find cudadevrt:
cudadevrt.lib not found
Finding libdevice from CUDA_HOME
        searching for compute_20...     ok
        searching for compute_30...     ok
        searching for compute_35...     ok
        searching for compute_50...     ok


__SVML Information__
SVML State, config.USING_SVML                 : True
SVML Library Loaded                           : True
llvmlite Using SVML Patched LLVM              : True
SVML Operational                              : True

__Threading Layer Information__
TBB Threading Layer Available                 : True
+-->TBB imported successfully.
OpenMP Threading Layer Available              : True
+-->Vendor: MS
Workqueue Threading Layer Available           : True
+-->Workqueue imported successfully.

__Numba Environment Variable Information__
None found.

__Conda Information__
Conda Build                                   : 3.21.8
Conda Env                                     : 4.12.0
Conda Platform                                : win-64
Conda Python Version                          : 3.9.7.final.0
Conda Root Writable                           : False
EN

回答 1

Stack Overflow用户

发布于 2022-05-27 09:09:17

此错误是由于版本不兼容造成的。根据官方文档,Tensorflow支持CUDA 11.2,但您使用的是CUDA 11.7。请卸载CUDA 11.7并从这里安装CUDA 10.1,因为它与Tensorflow 2.2兼容。要设置GPU,请按照这里的说明进行操作。有关更多细节,请参考经过测试的构建配置。谢谢!

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72260158

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档