首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >使用keras进行视觉问答时损失不收敛

使用keras进行视觉问答时损失不收敛
EN

Stack Overflow用户
提问于 2020-01-07 19:45:14
回答 1查看 59关注 0票数 0

我正在尝试训练一个用于视觉问题回答的神经网络,但损失一直在发散。基本的超参数修改没有给出结果,我也尝试了不同的模型,但都没有结果。下面是我使用的一个模型:

代码语言:javascript
运行
复制
word2vec_dim             =      30
num_hidden_nodes_mlp     =     1024
num_hidden_nodes_lstm    =      30
num_layers_lstm          =        2
dropout                  =       0.3
activation_mlp           =     'tanh'
num_epochs = 1

image_model = Sequential()
image_model.add(Reshape(input_shape = (320,480,4), target_shape=(320,480,4)))
image_model.add(Conv2D(4,(3,1)))
image_model.add(Conv2D(4,(1,3)))
image_model.add(MaxPooling2D(pool_size=(2, 2)))
image_model.add(Conv2D(4,(3,1)))
image_model.add(Conv2D(4,(1,3)))
image_model.add(MaxPooling2D(pool_size=(2, 2)))
image_model.add(Conv2D(4,(3,1)))
image_model.add(Conv2D(4,(1,3)))
image_model.add(MaxPooling2D(pool_size=(2, 2)))
image_model.add(Conv2D(4,(3,1)))
image_model.add(Conv2D(4,(1,3)))
image_model.add(Flatten())
image_model.add(Dense(num_hidden_nodes_lstm, activation='relu'))

model1 = Model(inputs = image_model.input, outputs = image_model.output)
model1.summary()


language_model = Sequential()
language_model.add(Embedding(len(unique_words)+1, word2vec_dim, input_length=max_lenght))
language_model.add(LSTM(units=num_hidden_nodes_lstm, 
                        return_sequences=True, input_shape=(None, word2vec_dim)))

for i in range(num_layers_lstm-2):
    language_model.add(LSTM(units=num_hidden_nodes_lstm, return_sequences=True))
language_model.add(LSTM(units=num_hidden_nodes_lstm, return_sequences=False))

model2 = Model(language_model.input, language_model.output)
model2.summary()

combined = concatenate([image_model.output, language_model.output])
model = Dense(512, activation="tanh", kernel_initializer="uniform")(combined)
#model = Activation('tanh')(model)
model = Dropout(0.3)(model)

model = Dense(512, activation="tanh", kernel_initializer="uniform")(model)
#model = Activation('tanh')(model)
#model = Dropout(0.5)(model)

#model = Dense(1024, activation="tanh", kernel_initializer="uniform")(model)
#model = Activation('tanh')(model)
#model = Dropout(0.5)(model)

model = Dense(13, activation="softmax")(model)


model = Model(inputs=[image_model.input, language_model.input], outputs=model)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
model.summary()

这里是训练代码。数据集被拆分为80/20,批处理大小为64,历元很低,但由于数据集很大(3k批),损失甚至在达到单个数据集的10%之前就爆炸式增长。单词目标类是一个热编码,问题编码是通过一对一的字典对应完成的(对每个单词使用字典,因为没有很多单词),留下0作为填充值。我有内嵌的逗号,问号等等。很难。

代码语言:javascript
运行
复制
train_gen=image_generator(batch_size=batch_size)
eval_gen=evaluation_generator(batch_size=batch_size)
model.fit(x=train_gen, epochs=2, verbose=1, validation_data=eval_gen, steps_per_epoch=training_batches ,validation_steps=evaluation_batches, shuffle=True, max_queue_size=10, callbacks=[save])

我还收到以下警告消息

代码语言:javascript
运行
复制
/opt/conda/lib/python3.6/site-packages/tensorflow_core/python/framework/indexed_slices.py:433: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

Epoch 1/2
 522/3243 [===>..........................] - ETA: 33:05 - loss: 2825421622922535501824.0000

我观察到,该模型将回答同一类的所有问题(我认为这是导致发散损失的原因)。

其中定义了image_generator:

代码语言:javascript
运行
复制
def my_hash (word):
        for x in range(dictionary_lenght-1):
            if word==unique_words[x]:
                return (x+1)
        print("Error, word not in the vocabulary")
def pad(sequence, lenght, value=0):
    for x in range(len(sequence), lenght):
        sequence.append(value)
    return sequence

def image_generator(batch_size = 32):
    zeros=[0]*13
    while True:
        for x2 in range(training_batches):# Select files (paths/indices) for the batch
            input_img_batch = []
            input_question_batch = []
            output_batch = [] 
            img_name=""
            for x in range(batch_size):
                temp=[]
                img_name=training_data["questions"][x+x2*batch_size]["image_filename"]
                question=training_data["questions"][x+x2*batch_size]["question"].replace("?","")

                question=hashing_trick(question, dictionary_lenght,hash_function=my_hash)

                question=pad(question, max_lenght)
                img = Image.open("/kaggle/input/ann-and-dl-vqa/dataset_vqa/train/" + img_name , 'r')
                img=img.resize([img_width, img_height])
                img=np.asarray(img)#execute the same process as before but the corrispective mask

                img=img/255
                input_img_batch.append(img)
                input_question_batch.append(question)
                dummy=zeros
                dummy[encode_answer(training_data["questions"][x+x2*batch_size]["answer"])]=1
                output_batch.append(dummy)

            # Return a tuple of (input,output) to feed the network
            batch_x1 = np.array( input_img_batch )
            batch_x2 = np.array( input_question_batch )
            batch_y = np.array( output_batch )

            yield( [batch_x1, batch_x2], batch_y )
EN

回答 1

Stack Overflow用户

发布于 2020-01-07 21:26:02

我解决了这个问题。image_generator中有一个问题。向量0以某种方式改变了值,变得等于dummy(而不是其他方式),并扰乱了预测目标。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59627809

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档