【论文推荐】最新5篇图像描述生成(Image Caption)相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【导读】专知内容组整理了最近五篇图像描述生成(Image Caption)相关文章,为大家进行介绍,欢迎查看!

1. Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions(图像描述生成:一个有效地将情感结合到图像描述中的方案)



作者:Quanzeng You,Hailin Jin,Jiebo Luo

摘要:Automatic image captioning has recently approached human-level performance due to the latest advances in computer vision and natural language understanding. However, most of the current models can only generate plain factual descriptions about the content of a given image. However, for human beings, image caption writing is quite flexible and diverse, where additional language dimensions, such as emotion, humor and language styles, are often incorporated to produce diverse, emotional, or appealing captions. In particular, we are interested in generating sentiment-conveying image descriptions, which has received little attention. The main challenge is how to effectively inject sentiments into the generated captions without altering the semantic matching between the visual content and the generated descriptions. In this work, we propose two different models, which employ different schemes for injecting sentiments into image captions. Compared with the few existing approaches, the proposed models are much simpler and yet more effective. The experimental results show that our model outperform the state-of-the-art models in generating sentimental (i.e., sentiment-bearing) image captions. In addition, we can also easily manipulate the model by assigning different sentiments to the testing image to generate captions with the corresponding sentiments.

期刊:arXiv, 2018年1月31日

网址

http://www.zhuanzhi.ai/document/71ae60a957ad68f4a80e330d05e67ef0

2. Order-Free RNN with Visual Attention for Multi-Label Classification(无序RNN与视觉注意力机制结合的多标签分类)



作者:Shang-Fu Chen,Yi-Chen Chen,Chih-Kuan Yeh,Yu-Chiang Frank Wang

摘要:In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.

期刊:arXiv, 2017年12月20日

网址

http://www.zhuanzhi.ai/document/0a34485f1b9e7e60314cc7ebe21d0add

3. Exploring Models and Data for Remote Sensing Image Caption Generation(探索遥感图像描述生成的模型和数据)



作者:Xiaoqiang Lu,Binqiang Wang,Xiangtao Zheng,Xuelong Li

摘要:Inspired by recent development of artificial satellite, remote sensing images have attracted extensive attention. Recently, noticeable progress has been made in scene classification and target detection.However, it is still not clear how to describe the remote sensing image content with accurate and concise sentences. In this paper, we investigate to describe the remote sensing images with accurate and flexible sentences. First, some annotated instructions are presented to better describe the remote sensing images considering the special characteristics of remote sensing images. Second, in order to exhaustively exploit the contents of remote sensing images, a large-scale aerial image data set is constructed for remote sensing image caption. Finally, a comprehensive review is presented on the proposed data set to fully advance the task of remote sensing caption. Extensive experiments on the proposed data set demonstrate that the content of the remote sensing image can be completely described by generating language descriptions. The data set is available at https://github.com/201528014227051/RSICD_optimal

期刊:arXiv, 2017年12月21日

网址

http://www.zhuanzhi.ai/document/33c9d18bd41a96f9df68d6364e2fb550

4. GeoSeq2Seq: Information Geometric Sequence-to-Sequence Networks(GeoSeq2Seq:信息几何序列到序列的网络)



作者:Alessandro Bay,Biswa Sengupta

摘要:The Fisher information metric is an important foundation of information geometry, wherein it allows us to approximate the local geometry of a probability distribution. Recurrent neural networks such as the Sequence-to-Sequence (Seq2Seq) networks that have lately been used to yield state-of-the-art performance on speech translation or image captioning have so far ignored the geometry of the latent embedding, that they iteratively learn. We propose the information geometric Seq2Seq (GeoSeq2Seq) network which abridges the gap between deep recurrent neural networks and information geometry. Specifically, the latent embedding offered by a recurrent network is encoded as a Fisher kernel of a parametric Gaussian Mixture Model, a formalism common in computer vision. We utilise such a network to predict the shortest routes between two nodes of a graph by learning the adjacency matrix using the GeoSeq2Seq formalism; our results show that for such a problem the probabilistic representation of the latent embedding supersedes the non-probabilistic embedding by 10-15\%.

期刊:arXiv, 2018年1月6日

网址

http://www.zhuanzhi.ai/document/448502417b54e0c13b89ee833489377c

5. Image Captioning using Deep Neural Architectures(基于深度神经结构的图像描述生成)



作者:Parth Shah,Vishvajit Bakarola,Supriya Pati

摘要:Automatically creating the description of an image using any natural languages sentence like English is a very challenging task. It requires expertise of both image processing as well as natural language processing. This paper discuss about different available models for image captioning task. We have also discussed about how the advancement in the task of object recognition and machine translation has greatly improved the performance of image captioning model in recent years. In addition to that we have discussed how this model can be implemented. In the end, we have also evaluated the performance of model using standard evaluation matrices.

期刊:arXiv, 2018年1月17日

网址

http://www.zhuanzhi.ai/document/d88e87447c9122fc4ef01738987f3f21

原文发布于微信公众号 - 专知(Quan_Zhuanzhi)

原文发表时间:2018-02-01

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏WD学习记录

机器学习 学习笔记(13)聚类

在无监督学习中,训练样本的标记信息是未知的,目标是通过对无标记训练样本的学习来揭示数据的内在性质及规律,为进一步的数据分析提供基础,此类学习任务中研究最多、应用...

13930
来自专栏AIUAI

Caffe2 - (三十) Detectron 之 modeling - 模型_heads

97170
来自专栏IT派

深度学习中的基础线代知识-初学者指南

导语:在经过一天之后,我们的活动人数已经达到40人了,感谢大家对小编的支持,同时在本文末附上前一天的众筹榜单。希望能跟小伙伴们度过愉快的6天! ? 上过 Jer...

35160
来自专栏专知

【论文推荐】最新7篇条件随机场(CRF)相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

【导读】专知内容组整理了最近七篇条件随机场(Conditional Random Field )相关文章,为大家进行介绍,欢迎查看! 1. Deep Neura...

45270
来自专栏AI科技大本营的专栏

技能 | 三次简化一张图: 一招理解LSTM/GRU门控机制

作者 | 张皓 引言 RNN是深度学习中用于处理时序数据的关键技术, 目前已在自然语言处理, 语音识别, 视频识别等领域取得重要突破, 然而梯度消失现象制约着R...

36880
来自专栏贾志刚-OpenCV学堂

详解对象检测网络性能评价指标mAP计算

上次我写了一篇关于mAP计算的文章,发布之后得到很多网友的反馈,有感于此,觉得有必要重写一篇,目的有两个,一个是告诉大家你们的意见我收到了,另外一个是细化了我对...

35420
来自专栏素质云笔记

LSH︱python实现局部敏感随机投影森林——LSHForest/sklearn(一)

关于局部敏感哈希算法,之前用R语言实现过,但是由于在R中效能太低,于是放弃用LSH来做相似性检索。学了python发现很多模块都能实现,而且通过随机投影森林让查...

48580
来自专栏专知

【论文推荐】最新5篇视觉目标跟踪相关论文—递归神经网络、深度适应计算策略、视觉目标跟踪基准、深度核化相关滤波、检测并跟踪

【导读】专知内容组整理了最近五篇视觉目标跟踪(Object Tracking)相关文章,为大家进行介绍,欢迎查看! 1. Learning Hierarchic...

36760
来自专栏图形学与OpenGL

模拟试题A

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/wpxu08/article/detail...

60710
来自专栏趣学算法

动态规划算法秘籍

动态规划是1957年理查德·贝尔曼在《Dynamic Programming》一书中提出来的,八卦一下,这个人可能有同学不知道,但他的一个算法你可能听说过,他和...

21720

扫码关注云+社区

领取腾讯云代金券