文章/答案/技术大牛

发布

社区首页 >问答首页 >使用keras进行视觉问答时损失不收敛

问使用keras进行视觉问答时损失不收敛
EN

Stack Overflow用户

提问于 2020-01-07 19:45:14

回答 1查看 59关注 0票数 0

我正在尝试训练一个用于视觉问题回答的神经网络，但损失一直在发散。基本的超参数修改没有给出结果，我也尝试了不同的模型，但都没有结果。下面是我使用的一个模型：

word2vec_dim             =      30
num_hidden_nodes_mlp     =     1024
num_hidden_nodes_lstm    =      30
num_layers_lstm          =        2
dropout                  =       0.3
activation_mlp           =     'tanh'
num_epochs = 1

image_model = Sequential()
image_model.add(Reshape(input_shape = (320,480,4), target_shape=(320,480,4)))
image_model.add(Conv2D(4,(3,1)))
image_model.add(Conv2D(4,(1,3)))
image_model.add(MaxPooling2D(pool_size=(2, 2)))
image_model.add(Conv2D(4,(3,1)))
image_model.add(Conv2D(4,(1,3)))
image_model.add(MaxPooling2D(pool_size=(2, 2)))
image_model.add(Conv2D(4,(3,1)))
image_model.add(Conv2D(4,(1,3)))
image_model.add(MaxPooling2D(pool_size=(2, 2)))
image_model.add(Conv2D(4,(3,1)))
image_model.add(Conv2D(4,(1,3)))
image_model.add(Flatten())
image_model.add(Dense(num_hidden_nodes_lstm, activation='relu'))

model1 = Model(inputs = image_model.input, outputs = image_model.output)
model1.summary()


language_model = Sequential()
language_model.add(Embedding(len(unique_words)+1, word2vec_dim, input_length=max_lenght))
language_model.add(LSTM(units=num_hidden_nodes_lstm, 
                        return_sequences=True, input_shape=(None, word2vec_dim)))

for i in range(num_layers_lstm-2):
    language_model.add(LSTM(units=num_hidden_nodes_lstm, return_sequences=True))
language_model.add(LSTM(units=num_hidden_nodes_lstm, return_sequences=False))

model2 = Model(language_model.input, language_model.output)
model2.summary()

combined = concatenate([image_model.output, language_model.output])
model = Dense(512, activation="tanh", kernel_initializer="uniform")(combined)
#model = Activation('tanh')(model)
model = Dropout(0.3)(model)

model = Dense(512, activation="tanh", kernel_initializer="uniform")(model)
#model = Activation('tanh')(model)
#model = Dropout(0.5)(model)

#model = Dense(1024, activation="tanh", kernel_initializer="uniform")(model)
#model = Activation('tanh')(model)
#model = Dropout(0.5)(model)

model = Dense(13, activation="softmax")(model)


model = Model(inputs=[image_model.input, language_model.input], outputs=model)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
model.summary()

这里是训练代码。数据集被拆分为80/20，批处理大小为64，历元很低，但由于数据集很大(3k批)，损失甚至在达到单个数据集的10%之前就爆炸式增长。单词目标类是一个热编码，问题编码是通过一对一的字典对应完成的(对每个单词使用字典，因为没有很多单词)，留下0作为填充值。我有内嵌的逗号，问号等等。很难。

train_gen=image_generator(batch_size=batch_size)
eval_gen=evaluation_generator(batch_size=batch_size)
model.fit(x=train_gen, epochs=2, verbose=1, validation_data=eval_gen, steps_per_epoch=training_batches ,validation_steps=evaluation_batches, shuffle=True, max_queue_size=10, callbacks=[save])

我还收到以下警告消息

/opt/conda/lib/python3.6/site-packages/tensorflow_core/python/framework/indexed_slices.py:433: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

Epoch 1/2
 522/3243 [===>..........................] - ETA: 33:05 - loss: 2825421622922535501824.0000

我观察到，该模型将回答同一类的所有问题(我认为这是导致发散损失的原因)。

其中定义了image_generator：

def my_hash (word):
        for x in range(dictionary_lenght-1):
            if word==unique_words[x]:
                return (x+1)
        print("Error, word not in the vocabulary")
def pad(sequence, lenght, value=0):
    for x in range(len(sequence), lenght):
        sequence.append(value)
    return sequence

def image_generator(batch_size = 32):
    zeros=[0]*13
    while True:
        for x2 in range(training_batches):# Select files (paths/indices) for the batch
            input_img_batch = []
            input_question_batch = []
            output_batch = [] 
            img_name=""
            for x in range(batch_size):
                temp=[]
                img_name=training_data["questions"][x+x2*batch_size]["image_filename"]
                question=training_data["questions"][x+x2*batch_size]["question"].replace("?","")

                question=hashing_trick(question, dictionary_lenght,hash_function=my_hash)

                question=pad(question, max_lenght)
                img = Image.open("/kaggle/input/ann-and-dl-vqa/dataset_vqa/train/" + img_name , 'r')
                img=img.resize([img_width, img_height])
                img=np.asarray(img)#execute the same process as before but the corrispective mask

                img=img/255
                input_img_batch.append(img)
                input_question_batch.append(question)
                dummy=zeros
                dummy[encode_answer(training_data["questions"][x+x2*batch_size]["answer"])]=1
                output_batch.append(dummy)

            # Return a tuple of (input,output) to feed the network
            batch_x1 = np.array( input_img_batch )
            batch_x2 = np.array( input_question_batch )
            batch_y = np.array( output_batch )

            yield( [batch_x1, batch_x2], batch_y )

python

tensorflow

keras

deep-learning

neural-network

回答 1

Stack Overflow用户

发布于 2020-01-07 21:26:02

我解决了这个问题。image_generator中有一个问题。向量0以某种方式改变了值，变得等于dummy(而不是其他方式)，并扰乱了预测目标。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/59627809

复制

相似问题

问使用keras进行视觉问答时损失不收敛
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用keras进行视觉问答时损失不收敛EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用keras进行视觉问答时损失不收敛
EN