前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >基于tensorflow 1.0的图像叙事功能测试(model/im2txt)

基于tensorflow 1.0的图像叙事功能测试(model/im2txt)

作者头像
sparkexpert
发布2018-01-09 11:50:58
1.4K0
发布2018-01-09 11:50:58
举报

作为多模态数据处理的经典,图像自动打标签(图像叙事功能)一直是一项非常前沿的技术,涉及到机器视觉,自然语言处理等模块。

幸运的是,谷歌基于tensorflow将此项功能进行开源。https://github.com/tensorflow/models/tree/master/im2txt#generating-captions

该功能的英文介绍如下:

The Show and Tell model is a deep neural network that learns how to describe the content of images.

其架构英文介绍如下:

The Show and Tell model is an example of an encoder-decoder neural network. It works by first "encoding" an image into a fixed-length vector representation, and then "decoding" the representation into a natural language description.

The image encoder is a deep convolutional neural network. This type of network is widely used for image tasks and is currently state-of-the-art for object recognition and detection. Our particular choice of network is the Inception v3 image recognition model pretrained on the ILSVRC-2012-CLS image classification dataset.

The decoder is a long short-term memory (LSTM) network. This type of network is commonly used for sequence modeling tasks such as language modeling and machine translation. In the Show and Tell model, the LSTM network is trained as a language model conditioned on the image encoding.

Words in the captions are represented with an embedding model. Each word in the vocabulary is associated with a fixed-length vector representation that is learned during training.

The following diagram illustrates the model architecture.

即结合了inception v3 + LSTM模型来实现整个架构。将图像的表示后向量与图像标记的词向量传入到整个模型中。(具体的模型见GITHUB相关页面,比较经典的。)

二、实验测试

为了进行实验,找了提前训练好的模型,不过由于本文实验在tensorflow 1.0版本之上,需要填好几个坑:

(1) word_counts.txt文件的处理,需要将文件中的 b' str'  ==>  str,即把字符串的引号等全部去掉。

(2)修改预训练模型中的名称,由于预训练模型的名称不一致的问题,所以需要进行修改。

在具体代码修改中,添加一个函数来进行模型的修改和重新保存

# 由于版本不同,需要进行修改 def RenameCkpt():     vars_to_rename = {     "lstm/BasicLSTMCell/Linear/Matrix": "lstm/basic_lstm_cell/weights",     "lstm/BasicLSTMCell/Linear/Bias": "lstm/basic_lstm_cell/biases",     }     new_checkpoint_vars = {}     reader = tf.train.NewCheckpointReader(FLAGS.checkpoint_path)     for old_name in reader.get_variable_to_shape_map():       if old_name in vars_to_rename:         new_name = vars_to_rename[old_name]       else:         new_name = old_name       new_checkpoint_vars[new_name] = tf.Variable(reader.get_tensor(old_name))     init = tf.global_variables_initializer()     saver = tf.train.Saver(new_checkpoint_vars)     with tf.Session() as sess:       sess.run(init)       saver.save(sess, "/home/ndscbigdata/work/change/tf/gan/im2txt/ckpt/newmodel.ckpt-2000000")     print("checkpoint file rename successful... ")

具体实验:

(1)手动设置一些参数

FLAGS.checkpoint_path = "/home/ndscbigdata/work/change/tf/gan/im2txt/ckpt/newmodel.ckpt-2000000" FLAGS.vocab_file = "./data/volab.txt" FLAGS.input_files = "./data/COCO_val2014_000000224477.jpg,./data/ep271.jpg,./data/dog.jpg"

(2)实验图片

图像 COCO_val2014_000000224477.jpg 标题是:   0) a man riding a wave on top of a surfboard . (概率=0.035672)   1) a person riding a surf board on a wave (概率=0.016238)   2) a man on a surfboard riding a wave . (概率=0.010146)

图像 ep271.jpg 标题是:   0) a woman is standing next to a horse . (概率=0.000759)   1) a woman is standing next to a horse (概率=0.000647)   2) a woman is standing next to a brown horse . (概率=0.000384)

图像 dog.jpg 标题是:   0) a dog is eating a slice of pizza . (概率=0.000138)   1) a dog is eating a slice of pizza on a plate . (概率=0.000047)   2) a dog is sitting at a table with a pizza on it . (概率=0.000039)

注:最后这张图片,是谷歌经典的实验用图,可以看出其测试结果还是相当令人满意的。

可惜由于实验硬件太差,要不可以结合inception v4来训练,应该效果会更好。另外,还有中文标签的生成。

具体的修改源码将公布在本人的github上,欢迎大家前往下载。https://github.com/ndscigdata

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2017年04月27日,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
NLP 服务
NLP 服务(Natural Language Process,NLP)深度整合了腾讯内部的 NLP 技术,提供多项智能文本处理和文本生成能力,包括词法分析、相似词召回、词相似度、句子相似度、文本润色、句子纠错、文本补全、句子生成等。满足各行业的文本智能需求。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档