在用于机器翻译的变压器网络的训练期间,GPU显示此错误。为什么会出现这个问题?
Traceback (most recent call last):
File "D:/Transformer_MC__translation/model.py", line 64, in <module>
output = model(train, label)
File "C:\Users\Devanshu\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1012, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "D:\Transformer_MC__translation\transformer.py", line 36, in call
enc_src = self.encoder(src, src_mask)
File "C:\Users\Devanshu\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1012, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "D:\Transformer_MC__translation\encoder.py", line 23, in call
output = layer(output, output, output, mask)
File "C:\Users\Devanshu\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1012, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "D:\Transformer_MC__translation\transformerblock.py", line 22, in call
x = self.dropout(self.norm1(attention+query))
File "C:\Users\Devanshu\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1012, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "C:\Users\Devanshu\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\keras\layers\normalization.py", line 1293, in call
outputs, _, _ = nn.fused_batch_norm(
File "C:\Users\Devanshu\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\util\dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "C:\Users\Devanshu\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\ops\nn_impl.py", line 1660, in fused_batch_norm
y, running_mean, running_var, _, _, _ = gen_nn_ops.fused_batch_norm_v3(
File "C:\Users\Devanshu\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 4255, in fused_batch_norm_v3
_ops.raise_from_not_ok_status(e, name)
File "C:\Users\Devanshu\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\framework\ops.py", line 6862, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: cuDNN launch failure : input shape ([1,4928,256,1]) [Op:FusedBatchNormV3]
这是编码器块
import tensorflow as tf
from selfattention import SelfAttention
from transformerblock import TransformerBlock
class DecoderBlock(tf.keras.layers.Layer):
def __init__(self, embed_size, head, forward_expansion, dropout):
super(DecoderBlock, self).__init__()
self.attention = SelfAttention(embed_size, head)
self.norm = tf.keras.layers.LayerNormalization()
self.transformer_block = TransformerBlock(embed_size, head, dropout=dropout, forward_expansion=forward_expansion)
self.dropout = tf.keras.layers.Dropout(dropout)
def call(self, inputs, key, value, src_mask, trg_mask):
attention = self.attention(inputs, inputs, inputs, trg_mask)
# skip connection
query = self.dropout(self.norm(attention + inputs))
print(query.shape)
output = self.transformer_block(value, key, query, src_mask)
return output
attention+input的输出形状为(64,80,250) (批量大小,检测长度,单词大小)
发布于 2021-09-21 04:19:27
您可以进行的解决问题的可能尝试。我曾经遇到过这个问题,当时我试图使用非常大的批处理大小,并通过减少它解决了这个问题。
batch_size
参数。逐渐增加它(2,4,8,10 etc.)中的不匹配
确保正确安装了所有依赖项(TF+CUDNN+CUDA),并在确定安装正确后减少batch_size
。
在您的情况下,我怀疑问题是由于大批量。
https://stackoverflow.com/questions/69268651
复制