专栏首页专知【论文推荐】最新5篇图像描述生成(Image Caption)相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成(Image Caption)相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【导读】专知内容组整理了最近五篇图像描述生成(Image Caption)相关文章,为大家进行介绍,欢迎查看!

1. Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions(图像描述生成:一个有效地将情感结合到图像描述中的方案)



作者:Quanzeng You,Hailin Jin,Jiebo Luo

摘要:Automatic image captioning has recently approached human-level performance due to the latest advances in computer vision and natural language understanding. However, most of the current models can only generate plain factual descriptions about the content of a given image. However, for human beings, image caption writing is quite flexible and diverse, where additional language dimensions, such as emotion, humor and language styles, are often incorporated to produce diverse, emotional, or appealing captions. In particular, we are interested in generating sentiment-conveying image descriptions, which has received little attention. The main challenge is how to effectively inject sentiments into the generated captions without altering the semantic matching between the visual content and the generated descriptions. In this work, we propose two different models, which employ different schemes for injecting sentiments into image captions. Compared with the few existing approaches, the proposed models are much simpler and yet more effective. The experimental results show that our model outperform the state-of-the-art models in generating sentimental (i.e., sentiment-bearing) image captions. In addition, we can also easily manipulate the model by assigning different sentiments to the testing image to generate captions with the corresponding sentiments.

期刊:arXiv, 2018年1月31日

网址

http://www.zhuanzhi.ai/document/71ae60a957ad68f4a80e330d05e67ef0

2. Order-Free RNN with Visual Attention for Multi-Label Classification(无序RNN与视觉注意力机制结合的多标签分类)



作者:Shang-Fu Chen,Yi-Chen Chen,Chih-Kuan Yeh,Yu-Chiang Frank Wang

摘要:In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.

期刊:arXiv, 2017年12月20日

网址

http://www.zhuanzhi.ai/document/0a34485f1b9e7e60314cc7ebe21d0add

3. Exploring Models and Data for Remote Sensing Image Caption Generation(探索遥感图像描述生成的模型和数据)



作者:Xiaoqiang Lu,Binqiang Wang,Xiangtao Zheng,Xuelong Li

摘要:Inspired by recent development of artificial satellite, remote sensing images have attracted extensive attention. Recently, noticeable progress has been made in scene classification and target detection.However, it is still not clear how to describe the remote sensing image content with accurate and concise sentences. In this paper, we investigate to describe the remote sensing images with accurate and flexible sentences. First, some annotated instructions are presented to better describe the remote sensing images considering the special characteristics of remote sensing images. Second, in order to exhaustively exploit the contents of remote sensing images, a large-scale aerial image data set is constructed for remote sensing image caption. Finally, a comprehensive review is presented on the proposed data set to fully advance the task of remote sensing caption. Extensive experiments on the proposed data set demonstrate that the content of the remote sensing image can be completely described by generating language descriptions. The data set is available at https://github.com/201528014227051/RSICD_optimal

期刊:arXiv, 2017年12月21日

网址

http://www.zhuanzhi.ai/document/33c9d18bd41a96f9df68d6364e2fb550

4. GeoSeq2Seq: Information Geometric Sequence-to-Sequence Networks(GeoSeq2Seq:信息几何序列到序列的网络)



作者:Alessandro Bay,Biswa Sengupta

摘要:The Fisher information metric is an important foundation of information geometry, wherein it allows us to approximate the local geometry of a probability distribution. Recurrent neural networks such as the Sequence-to-Sequence (Seq2Seq) networks that have lately been used to yield state-of-the-art performance on speech translation or image captioning have so far ignored the geometry of the latent embedding, that they iteratively learn. We propose the information geometric Seq2Seq (GeoSeq2Seq) network which abridges the gap between deep recurrent neural networks and information geometry. Specifically, the latent embedding offered by a recurrent network is encoded as a Fisher kernel of a parametric Gaussian Mixture Model, a formalism common in computer vision. We utilise such a network to predict the shortest routes between two nodes of a graph by learning the adjacency matrix using the GeoSeq2Seq formalism; our results show that for such a problem the probabilistic representation of the latent embedding supersedes the non-probabilistic embedding by 10-15\%.

期刊:arXiv, 2018年1月6日

网址

http://www.zhuanzhi.ai/document/448502417b54e0c13b89ee833489377c

5. Image Captioning using Deep Neural Architectures(基于深度神经结构的图像描述生成)



作者:Parth Shah,Vishvajit Bakarola,Supriya Pati

摘要:Automatically creating the description of an image using any natural languages sentence like English is a very challenging task. It requires expertise of both image processing as well as natural language processing. This paper discuss about different available models for image captioning task. We have also discussed about how the advancement in the task of object recognition and machine translation has greatly improved the performance of image captioning model in recent years. In addition to that we have discussed how this model can be implemented. In the end, we have also evaluated the performance of model using standard evaluation matrices.

期刊:arXiv, 2018年1月17日

网址

http://www.zhuanzhi.ai/document/d88e87447c9122fc4ef01738987f3f21

本文分享自微信公众号 - 专知(Quan_Zhuanzhi),作者:专知内容组(编)

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2018-02-01

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

    【导读】专知内容组整理了最近生成对抗网络相关文章,为大家进行介绍,欢迎查看! 1. Semi-supervised FusedGAN for Condition...

    WZEARW
  • 【论文推荐】最新7篇变分自编码器(VAE)相关论文—汉语诗歌、生成模型、跨模态、MR图像重建、机器翻译、推断、合成人脸

    【导读】专知内容组整理了最近七篇变分自编码器(Variational Autoencoders)相关文章,为大家进行介绍,欢迎查看! 1. Generating...

    WZEARW
  • 【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

    【导读】专知内容组整理了最近六篇对抗自编码器(Adversarial Autoencoder)相关文章,为大家进行介绍,欢迎查看! 1. AAANE: Atte...

    WZEARW
  • QQ x KAKAO 全系列联名设计来袭!

    ? 01 关于项目 | About the Project Kakao friends以手机聊天软件Kakao talk表情包为起点,推出家族八位成员的故事...

    腾讯ISUX
  • 用机器生成人类活动的层次结构来揭示机器的思维方式 (CS)

    基于深度学习的计算机视觉模型已经被证明是人类活动识别(HAR)的突破性方法。然而,大多数现有的工作都致力于通过创建新的模型架构,增加模型的复杂性,或通过在更大的...

    管欣8078776
  • HTML4.01规范-文本(3)

    Note. The following section is an informative description of the behavior of som...

    py3study
  • 2019-2020 设计趋势 · Avatar角色篇

    ? 腾讯ISUX isux.tencent.com 社交用户体验设计 ? ? ? 在本文中,我们将分享对于最近又逐渐流行起来的角色服务的见解。随着IT世界成...

    腾讯ISUX
  • 参数化与人工智能,从计算机辅助到计算机决策,同济大学DigitalFuture演讲记录

    这是他在同济大学DigitalFuture演讲稿,为我们介绍了人工智能在建筑领域的应用。欢迎大家关注他的公众号(见文末)

    mixlab
  • TCN v2 + 3Dconv 运动信息

    https://sites.google.com/view/actionablerepresentations

    用户1908973
  • 用超图进行超真实感图像填充(CS CV)

    图像修复是计算机视觉中的一项重要任务,它可能依赖于图像的全局信息来填充缺失的数据。现有的方法大多使用注意力机制来学习图像的全局上下文。由于无法捕捉全局上下文,这...

    凌茜

扫码关注云+社区

领取腾讯云代金券