专栏首页专知【论文推荐】最新七篇视觉问答(VQA)相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答(VQA)相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【导读】专知内容组整理了最近七篇视觉问答(Visual Question Answering)相关文章,为大家进行介绍,欢迎查看!

1. Differential Attention for Visual Question Answering(基于差别注意力机制的视觉问答)



作者:Badri Patro,Vinay P. Namboodir

摘要:In this paper we aim to answer questions based on images when provided with a dataset of question-answer pairs for a number of images during training. A number of methods have focused on solving this problem by using image based attention. This is done by focusing on a specific part of the image while answering the question. Humans also do so when solving this problem. However, the regions that the previous systems focus on are not correlated with the regions that humans focus on. The accuracy is limited due to this drawback. In this paper, we propose to solve this problem by using an exemplar based method. We obtain one or more supporting and opposing exemplars to obtain a differential attention region. This differential attention is closer to human attention than other image based attention methods. It also helps in obtaining improved accuracy when answering questions. The method is evaluated on challenging benchmark datasets. We perform better than other image based attention methods and are competitive with other state of the art methods that focus on both image and questions.

期刊:arXiv, 2018年4月1日

网址

http://www.zhuanzhi.ai/document/4d699e2dd5fd932eb9309e15139ffa56

2. Visual Question Reasoning on General Dependency Tree(一般依赖树的视觉问题推理)



作者:Qingxing Cao,Xiaodan Liang,Bailing Li,Guanbin Li,Liang Lin

机构:Sun Yat-sen University

摘要:The collaborative reasoning for understanding each image-question pair is very critical but under-explored for an interpretable Visual Question Answering (VQA) system. Although very recent works also tried the explicit compositional processes to assemble multiple sub-tasks embedded in the questions, their models heavily rely on the annotations or hand-crafted rules to obtain valid reasoning layout, leading to either heavy labor or poor performance on composition reasoning. In this paper, to enable global context reasoning for better aligning image and language domains in diverse and unrestricted cases, we propose a novel reasoning network called Adversarial Composition Modular Network (ACMN). This network comprises of two collaborative modules: i) an adversarial attention module to exploit the local visual evidence for each word parsed from the question; ii) a residual composition module to compose the previously mined evidence. Given a dependency parse tree for each question, the adversarial attention module progressively discovers salient regions of one word by densely combining regions of child word nodes in an adversarial manner. Then residual composition module merges the hidden representations of an arbitrary number of children through sum pooling and residual connection. Our ACMN is thus capable of building an interpretable VQA system that gradually dives the image cues following a question-driven reasoning route and makes global reasoning by incorporating the learned knowledge of all attention modules in a principled manner. Experiments on relational datasets demonstrate the superiority of our ACMN and visualization results show the explainable capability of our reasoning system.

期刊:arXiv, 2018年3月31日

网址

http://www.zhuanzhi.ai/document/05727d37932097fb1236283d12b001fc

3. DDRprog: A CLEVR Differentiable Dynamic Reasoning Programmer(DDRprog:一个CLEVR可微动态推理程序)



作者:Joseph Suarez,Justin Johnson,Fei-Fei Li

机构:Stanford University

摘要:We present a novel Dynamic Differentiable Reasoning (DDR) framework for jointly learning branching programs and the functions composing them; this resolves a significant nondifferentiability inhibiting recent dynamic architectures. We apply our framework to two settings in two highly compact and data efficient architectures: DDRprog for CLEVR Visual Question Answering and DDRstack for reverse Polish notation expression evaluation. DDRprog uses a recurrent controller to jointly predict and execute modular neural programs that directly correspond to the underlying question logic; it explicitly forks subprocesses to handle logical branching. By effectively leveraging additional structural supervision, we achieve a large improvement over previous approaches in subtask consistency and a small improvement in overall accuracy. We further demonstrate the benefits of structural supervision in the RPN setting: the inclusion of a stack assumption in DDRstack allows our approach to generalize to long expressions where an LSTM fails the task.

期刊:arXiv, 2018年3月30日

网址

http://www.zhuanzhi.ai/document/ebdf956afa949ee32f020cdb356f7cd3

4. Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering



作者:Unnat Jain,Svetlana Lazebnik,Alexander Schwing

机构:UIUC

摘要:Human conversation is a complex mechanism with subtle nuances. It is hence an ambitious goal to develop artificial intelligence agents that can participate fluently in a conversation. While we are still far from achieving this goal, recent progress in visual question answering, image captioning, and visual question generation shows that dialog systems may be realizable in the not too distant future. To this end, a novel dataset was introduced recently and encouraging results were demonstrated, particularly for question answering. In this paper, we demonstrate a simple symmetric discriminative baseline, that can be applied to both predicting an answer as well as predicting a question. We show that this method performs on par with the state of the art, even memory net based methods. In addition, for the first time on the visual dialog dataset, we assess the performance of a system asking questions, and demonstrate how visual dialog can be generated from discriminative question generation and question answering.

期刊:arXiv, 2018年3月30日

网址

http://www.zhuanzhi.ai/document/8f1d5d863cc03d7b3eff2f2fa29cd19d

5. DVQA: Understanding Data Visualizations via Question Answering(DVQA:通过问答来理解数据可视化)



作者:Kushal Kafle,Brian Price,Scott Cohen,Christopher Kanan

机构:Adobe Research,Rochester Institute of Technology

摘要:Bar charts are an effective way to convey numeric information, but today's algorithms cannot parse them. Existing methods fail when faced with even minor variations in appearance. Here, we present DVQA, a dataset that tests many aspects of bar chart understanding in a question answering framework. Unlike visual question answering (VQA), DVQA requires processing words and answers that are unique to a particular bar chart. State-of-the-art VQA algorithms perform poorly on DVQA, and we propose two strong baselines that perform considerably better. Our work will enable algorithms to automatically extract numeric and semantic information from vast quantities of bar charts found in scientific publications, Internet articles, business reports, and many other areas.

期刊:arXiv, 2018年3月30日

网址

http://www.zhuanzhi.ai/document/db997baf09a2e6889b14993575925ea4

6. Visual Question Answering with Memory-Augmented Networks(基于记忆增强网络的视觉问答)



作者:Chao Ma,Chunhua Shen,Anthony Dick,Qi Wu,Peng Wang,Anton van den Hengel,Ian Reid

机构:The University of Adelaide

摘要:In this paper, we exploit a memory-augmented neural network to predict accurate answers to visual questions, even when those answers occur rarely in the training set. The memory network incorporates both internal and external memory blocks and selectively pays attention to each training exemplar. We show that memory-augmented neural networks are able to maintain a relatively long-term memory of scarce training exemplars, which is important for visual question answering due to the heavy-tailed distribution of answers in a general VQA setting. Experimental results on two large-scale benchmark datasets show the favorable performance of the proposed algorithm with a comparison to state of the art.

期刊:arXiv, 2018年3月25日

网址

http://www.zhuanzhi.ai/document/a44ac794e983a57c9d4f1c7407f0eefd

7. Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering(用于视觉问答的端到端神经架构的显式推理)



作者:Somak Aditya,Yezhou Yang,Chitta Baral

机构:Arizona State University

摘要:Many vision and language tasks require commonsense reasoning beyond data-driven image and natural language processing. Here we adopt Visual Question Answering (VQA) as an example task, where a system is expected to answer a question in natural language about an image. Current state-of-the-art systems attempted to solve the task using deep neural architectures and achieved promising performance. However, the resulting systems are generally opaque and they struggle in understanding questions for which extra knowledge is required. In this paper, we present an explicit reasoning layer on top of a set of penultimate neural network based systems. The reasoning layer enables reasoning and answering questions where additional knowledge is required, and at the same time provides an interpretable interface to the end users. Specifically, the reasoning layer adopts a Probabilistic Soft Logic (PSL) based engine to reason over a basket of inputs: visual relations, the semantic parse of the question, and background ontological knowledge from word2vec and ConceptNet. Experimental analysis of the answers and the key evidential predicates generated on the VQA dataset validate our approach.

期刊:arXiv, 2018年3月24日

网址

http://www.zhuanzhi.ai/document/5f9f8b81a22d7eaa6dc643b6dcc01925

-END-

本文分享自微信公众号 - 专知(Quan_Zhuanzhi)

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2018-04-19

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

    【导读】专知内容组整理了最近六篇图像描述生成(Image Caption)相关文章,为大家进行介绍,欢迎查看! 1. Textually Customized ...

    WZEARW
  • 【论文推荐】最新八篇主题模型相关论文—主题建模优化、变分推断、情绪强度、神经语言模型、搜索、社区聚合、主题建模的问题、光谱学习

    【导读】专知内容组整理了最近八篇主题模型(Topic Model)相关文章,为大家进行介绍,欢迎查看! 1. Application of Rényi and ...

    WZEARW
  • 【论文推荐】最新八篇强化学习相关论文—残差网络、QMIX、元学习、动态速率分配、分层强化学习、抽象概况、快速物体检测、SOM

    【导读】专知内容组整理了最近八篇强化学习(Reinforcement learning)相关文章,为大家进行介绍,欢迎查看! 1.BlockDrop: Dyna...

    WZEARW
  • Woocommerce Trends 2020

    Top Woocommerce Trends To Follow In 2020. If you have an online store and missed...

    用户4822892
  • 多处理器系统中具有多个临界段的实时任务的安排(CS OS)

    多处理器同步和锁定协议的性能是在实时约束下利用多处理器系统计算能力的关键因素。虽然在过去的几十年里已经开发了多种协议,但它们的性能在很大程度上取决于任务划分和优...

    邱邱邱
  • 逻辑,概率和行动:情境演算的观点(CS AI)

    逻辑和概率的统一是人工智能(尤其是科学哲学)长期以来一直关注的问题。本质上,逻辑提供了一种简单的方法来指定必须存在于每个可能世界中的属性,而概率使我们可以进一步...

    小童
  • 在法庭上的表演:法国上诉法院判决的自动处理和可视化(CS AI)

    人工智能技术在法律领域已经很流行并且很重要。我们从司法判决中提取法律指标,以减少法律系统信息的不对称性和诉诸司法的机会。我们使用NLP方法从判决中提取有趣的实体...

    刘子蔚
  • The Rise of Cognitive Business

    When the original Watson won on the TV quiz show Jeopardy! in 2011, it was one c...

    首席架构师智库
  • 从本体感知到新型环境中的长距离规划:分层RL模型(CS AI)

    为了使智能代理能够在复杂环境中灵活高效地进行操作,它们必须能够在时间,空间和概念抽象的多个级别进行推理。在较低级别,代理必须解释其本体感受输入并控制其肌肉,而在...

    刘子蔚
  • Watson Uses Cognitive Computing To Improve People's Lives

    IDC predicts that by 2018, half of all consumers will interact with services bas...

    首席架构师智库

扫码关注云+社区

领取腾讯云代金券