专栏首页专知【论文推荐】最新5篇图像描述生成(Image Caption)相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成(Image Caption)相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【导读】专知内容组整理了最近五篇图像描述生成(Image Caption)相关文章,为大家进行介绍,欢迎查看!

1. Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions(图像描述生成:一个有效地将情感结合到图像描述中的方案)

作者:Quanzeng You,Hailin Jin,Jiebo Luo

摘要:Automatic image captioning has recently approached human-level performance due to the latest advances in computer vision and natural language understanding. However, most of the current models can only generate plain factual descriptions about the content of a given image. However, for human beings, image caption writing is quite flexible and diverse, where additional language dimensions, such as emotion, humor and language styles, are often incorporated to produce diverse, emotional, or appealing captions. In particular, we are interested in generating sentiment-conveying image descriptions, which has received little attention. The main challenge is how to effectively inject sentiments into the generated captions without altering the semantic matching between the visual content and the generated descriptions. In this work, we propose two different models, which employ different schemes for injecting sentiments into image captions. Compared with the few existing approaches, the proposed models are much simpler and yet more effective. The experimental results show that our model outperform the state-of-the-art models in generating sentimental (i.e., sentiment-bearing) image captions. In addition, we can also easily manipulate the model by assigning different sentiments to the testing image to generate captions with the corresponding sentiments.

期刊:arXiv, 2018年1月31日



2. Order-Free RNN with Visual Attention for Multi-Label Classification(无序RNN与视觉注意力机制结合的多标签分类)

作者:Shang-Fu Chen,Yi-Chen Chen,Chih-Kuan Yeh,Yu-Chiang Frank Wang

摘要:In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.

期刊:arXiv, 2017年12月20日



3. Exploring Models and Data for Remote Sensing Image Caption Generation(探索遥感图像描述生成的模型和数据)

作者:Xiaoqiang Lu,Binqiang Wang,Xiangtao Zheng,Xuelong Li

摘要:Inspired by recent development of artificial satellite, remote sensing images have attracted extensive attention. Recently, noticeable progress has been made in scene classification and target detection.However, it is still not clear how to describe the remote sensing image content with accurate and concise sentences. In this paper, we investigate to describe the remote sensing images with accurate and flexible sentences. First, some annotated instructions are presented to better describe the remote sensing images considering the special characteristics of remote sensing images. Second, in order to exhaustively exploit the contents of remote sensing images, a large-scale aerial image data set is constructed for remote sensing image caption. Finally, a comprehensive review is presented on the proposed data set to fully advance the task of remote sensing caption. Extensive experiments on the proposed data set demonstrate that the content of the remote sensing image can be completely described by generating language descriptions. The data set is available at https://github.com/201528014227051/RSICD_optimal

期刊:arXiv, 2017年12月21日



4. GeoSeq2Seq: Information Geometric Sequence-to-Sequence Networks(GeoSeq2Seq:信息几何序列到序列的网络)

作者:Alessandro Bay,Biswa Sengupta

摘要:The Fisher information metric is an important foundation of information geometry, wherein it allows us to approximate the local geometry of a probability distribution. Recurrent neural networks such as the Sequence-to-Sequence (Seq2Seq) networks that have lately been used to yield state-of-the-art performance on speech translation or image captioning have so far ignored the geometry of the latent embedding, that they iteratively learn. We propose the information geometric Seq2Seq (GeoSeq2Seq) network which abridges the gap between deep recurrent neural networks and information geometry. Specifically, the latent embedding offered by a recurrent network is encoded as a Fisher kernel of a parametric Gaussian Mixture Model, a formalism common in computer vision. We utilise such a network to predict the shortest routes between two nodes of a graph by learning the adjacency matrix using the GeoSeq2Seq formalism; our results show that for such a problem the probabilistic representation of the latent embedding supersedes the non-probabilistic embedding by 10-15\%.

期刊:arXiv, 2018年1月6日



5. Image Captioning using Deep Neural Architectures(基于深度神经结构的图像描述生成)

作者:Parth Shah,Vishvajit Bakarola,Supriya Pati

摘要:Automatically creating the description of an image using any natural languages sentence like English is a very challenging task. It requires expertise of both image processing as well as natural language processing. This paper discuss about different available models for image captioning task. We have also discussed about how the advancement in the task of object recognition and machine translation has greatly improved the performance of image captioning model in recent years. In addition to that we have discussed how this model can be implemented. In the end, we have also evaluated the performance of model using standard evaluation matrices.

期刊:arXiv, 2018年1月17日



本文分享自微信公众号 - 专知(Quan_Zhuanzhi),作者:专知内容组(编)

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。




0 条评论
登录 后参与评论


  • 最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

    【导读】专知内容组整理了最近生成对抗网络相关文章,为大家进行介绍,欢迎查看! 1. Semi-supervised FusedGAN for Condition...

  • 【论文推荐】最新7篇变分自编码器(VAE)相关论文—汉语诗歌、生成模型、跨模态、MR图像重建、机器翻译、推断、合成人脸

    【导读】专知内容组整理了最近七篇变分自编码器(Variational Autoencoders)相关文章,为大家进行介绍,欢迎查看! 1. Generating...

  • 【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

    【导读】专知内容组整理了最近六篇对抗自编码器(Adversarial Autoencoder)相关文章,为大家进行介绍,欢迎查看! 1. AAANE: Atte...

  • QQ x KAKAO 全系列联名设计来袭!

    ? 01 关于项目 | About the Project Kakao friends以手机聊天软件Kakao talk表情包为起点,推出家族八位成员的故事...

  • 用机器生成人类活动的层次结构来揭示机器的思维方式 (CS)


  • HTML4.01规范-文本(3)

    Note. The following section is an informative description of the behavior of som...

  • 2019-2020 设计趋势 · Avatar角色篇

    ? 腾讯ISUX isux.tencent.com 社交用户体验设计 ? ? ? 在本文中,我们将分享对于最近又逐渐流行起来的角色服务的见解。随着IT世界成...

  • 参数化与人工智能,从计算机辅助到计算机决策,同济大学DigitalFuture演讲记录


  • TCN v2 + 3Dconv 运动信息


  • 用超图进行超真实感图像填充(CS CV)