# 准备工作

• 装好TensorFlow；
• 安装pandas、opencv2、Jupyter库；
• 下载Flicker30k数据集的图像嵌入和图像描述

# 图像描述生成是图像分类的扩展

1. 我们如何在已有成功的图像分类模型的基础上，从图像中获取重要信息？

2. 我们的模型如何在理解图像的基础上，融合信息实现字幕生成？

# 运用迁移学习

```def get_data(annotation_path, feature_path):

# 建立和训练模型

```def build_model(self):
# declaring the placeholders for our extracted image feature vectors, our caption, and our mask
# (describes how long our caption is with an array of 0/1 values of length `maxlen`
img = tf.placeholder(tf.float32, [self.batch_size, self.dim_in])
caption_placeholder = tf.placeholder(tf.int32, [self.batch_size, self.n_lstm_steps])
mask = tf.placeholder(tf.float32, [self.batch_size, self.n_lstm_steps])        # getting an initial LSTM embedding from our image_imbedding
image_embedding = tf.matmul(img, self.img_embedding) + self.img_embedding_bias        # setting initial state of our LSTM
state = self.lstm.zero_state(self.batch_size, dtype=tf.float32)

total_ loss = 0.0
with tf.variable_scope("RNN"):            for i in range(self.n_lstm_steps):
if i > 0:                   #if this isn't the first iteration of our LSTM we need to get the word_embedding corresponding
# to the (i-1)th word in our caption
with tf.device("/cpu:0"):
current_embedding = tf.nn.embedding_lookup(self.word_embedding, caption_placeholder[:,i-1]) + self.embedding_bias                else:                     #if this is the first iteration of our LSTM we utilize the embedded image as our input
current_embedding = image_embedding                if i > 0:
# allows us to reuse the LSTM tensor variable on each iteration
tf.get_variable_scope().reuse_variables()

out, state = self.lstm(current_embedding, state)                print (out,self.word_encoding,self.word_encoding_bias)                if i > 0:                    #get the one-hot representation of the next word in our caption
labels = tf.expand_dims(caption_placeholder[:, i], 1)
ix_range=tf.range(0, self.batch_size, 1)
ixs = tf.expand_dims(ix_range, 1)
concat = tf.concat([ixs, labels],1)
onehot = tf.sparse_to_dense(
concat, tf.stack([self.batch_size, self.n_words]), 1.0, 0.0)                    #perform a softmax classification to generate the next word in the caption
logit = tf.matmul(out, self.word_encoding) + self.word_encoding_bias
xentropy = tf.nn.softmax_cross_entropy_with_logits(logits=logit, labels=onehot)

loss = tf.reduce_sum(xentropy)
total_loss += loss

# 通过推断生成描述

```def build_generator(self, maxlen, batchsize=1):
#same setup as `build_model` function
img = tf.placeholder(tf.float32, [self.batch_size, self.dim_in])
image_embedding = tf.matmul(img, self.img_embedding) + self.img_embedding_bias
state = self.lstm.zero_state(batchsize,dtype=tf.float32)       #declare list to hold the words of our generated captions
all_words = []        print (state,image_embedding,img)        with tf.variable_scope("RNN"):           # in the first iteration we have no previous word, so we directly pass in the image embedding
# and set the `previous_word` to the embedding of the start token ([0]) for the future iterations
output, state = self.lstm(image_embedding, state)
previous_word = tf.nn.embedding_lookup(self.word_embedding, [0]) + self.embedding_bias            for i in range(maxlen):
tf.get_variable_scope().reuse_variables()

out, state = self.lstm(previous_word, state)               # get a one-hot word encoding from the output of the LSTM
logit = tf.matmul(out, self.word_encoding) + self.word_encoding_bias
best_word = tf.argmax(logit, 1)                with tf.device("/cpu:0"):                   # get the embedding of the best_word to use as input to the next iteration of our LSTM
previous_word = tf.nn.embedding_lookup(self.word_embedding, best_word)

previous_word += self.embedding_bias

all_words.append(best_word)        return img, all_words```

# 后续工作

0 条评论

• ### 发表论文不用愁，十大技巧治秃头

不过明明做了工作却发不出来文章，且不说老板炸不炸毛，自己都觉得有点亏。更别说博士们的职业前景还与 paper 息息相关。

• ### NeurIPS 2019截稿期服务器被挤爆，投稿数增长40%创历史新高

投稿数量达到6809，与2018年相比增长40%。论文提交截止日期前几分钟，平台被挤爆。

• ### 华人斩获最佳Demo论文，Bengio获时间检验奖，最佳论文突破NLP传统测试方法 | ACL 2020

这届ACL的最佳论文是《Beyond Accuracy: Behavioral Testing of NLP Models with CheckList》。（文...

• ### 利用TensorFlow生成图像标题

图像标题生成器模型结合了计算机视觉和机器翻译的最新进展，利用神经网络生成现实的 图像标题。神经图像标题模型被训练，以最大限度地产生给定输入图像的字幕的可能性。并...

• ### 上线三年却很“鸡肋”的微信声音锁究竟做错了什么？

栏目简介：激荡六十年，人工智能已经起航。然而在未来面前，我们都还是孩子。究竟是“奇点临近”？还是泡沫行将破灭？为了解惑，《AI名人堂》将汇聚领航者智慧，和你一起...

• ### 人工智能技术在声纹识别方面的应用 | 解读技术

人工智能技术对于传统产业的推进作用越来越凸显，极大提升了传统产品的商业价值。“听声识我，开口即播”长虹CHiQ5人工智能电视成为全球首款搭载声纹识别的人工智能电...

• ### tensorflow ‘/biases/Adam_1’not in ckpt file

版权声明：本文为博主原创文章，未经博主允许不得转载。 https://blog.csdn.net/qq_25737169/article/d...

• ### 第二十期技术雷达正式发布——给你有态度的技术解析！

技术雷达是ThoughtWorks每半年发布一期的技术趋势报告，它不仅是一份持续的技术成熟度评估，其产生还源于ThoughtWorks另一个更大宏大的使命—IT...

• ### 【机器学习笔记之二】决策树的python实现

本文结构： 是什么？ 有什么算法？ 数学原理？ 编码实现算法？ ---- 1. 是什么？ 简单地理解，就是根据一些 feature 进行分类，每个节点提一个问题...