抱歉,你查看的文章不存在

【论文推荐】最新五篇视觉问答相关论文—深度学习评价、交互注意融合、VizWiz、引导注意力、

【导读】专知内容组既昨天推出六篇视觉问答(Visual Question Answering, VQA)相关论文,今天为大家推出最新五篇视觉问答相关论文,欢迎查看!

8.Did the Model Understand the Question?(模型理解问题了吗?)



作者:Pramod Kaushik Mudrakarta,Ankur Taly,Mukund Sundararajan,Kedar Dhamdhere

ACL 2018 long paper

机构:University of Chicago

摘要:We analyze state-of-the-art deep learning models for three tasks: question answering on (1) images, (2) tables, and (3) passages of text. Using the notion of \emph{attribution} (word importance), we find that these deep networks often ignore important question terms. Leveraging such behavior, we perturb questions to craft a variety of adversarial examples. Our strongest attacks drop the accuracy of a visual question answering model from $61.1\%$ to $19\%$, and that of a tabular question answering model from $33.5\%$ to $3.3\%$. Additionally, we show how attributions can strengthen attacks proposed by Jia and Liang (2017) on paragraph comprehension models. Our results demonstrate that attributions can augment standard measures of accuracy and empower investigation of model performance. When a model is accurate but for the wrong reasons, attributions can surface erroneous logic in the model that indicates inadequacies in the test data.

期刊:arXiv, 2018年5月15日

网址

http://www.zhuanzhi.ai/document/0851e20a331fc05fcd7228666f3b7539

9.Deep learning evaluation using deep linguistic processing(使用深度语言处理的深度学习评价)



作者:Alexander Kuhnle,Ann Copestake

机构:University of Cambridge

摘要:We discuss problems with the standard approaches to evaluation for tasks like visual question answering, and argue that artificial data can be used to address these as a complement to current practice. We demonstrate that with the help of existing 'deep' linguistic processing technology we are able to create challenging abstract datasets, which enable us to investigate the language understanding abilities of multimodal deep learning models in detail, as compared to a single performance value on a static and monolithic dataset.

期刊:arXiv, 2018年5月12日

网址

http://www.zhuanzhi.ai/document/09d0309bd825aacee63502631c317504

10.Reciprocal Attention Fusion for Visual Question Answering(视觉问题回答的交互注意融合)



作者:Moshiur R Farazi,Salman Khan

机构:Australian National University

摘要:Existing attention mechanisms either attend to local image grid or object level features for Visual Question Answering (VQA). Motivated by the observation that questions can relate to both object instances and their parts, we propose a novel attention mechanism that jointly considers reciprocal relationships between the two levels of visual details. The bottom-up attention thus generated is further coalesced with the top-down information to only focus on the scene elements that are most relevant to a given question. Our design hierarchically fuses multi-modal information i.e., language, object- and gird-level features, through an efficient tensor decomposition scheme. The proposed model improves the state-of-the-art single model performances from 67.9% to 68.2% on VQAv1 and from 65.3% to 67.4% on VQAv2, demonstrating a significant boost.

期刊:arXiv, 2018年5月11日

网址

http://www.zhuanzhi.ai/document/3e4851d75cdd3c9e57d95bf61ad183a2

11.VizWiz Grand Challenge: Answering Visual Questions from Blind People(VizWiz Grand Challenge:回答盲人的视觉问题)



作者:Danna Gurari,Qing Li,Abigale J. Stangl,Anhong Guo,Chi Lin,Kristen Grauman,Jiebo Luo,Jeffrey P. Bigham

机构:University of Texas at Austin,University of Science and Technology of China,University of Colorado Boulder,Carnegie Mellon University,University of Rochester

摘要:The study of algorithms to automatically answer visual questions currently is motivated by visual question answering (VQA) datasets constructed in artificial VQA settings. We propose VizWiz, the first goal-oriented VQA dataset arising from a natural VQA setting. VizWiz consists of over 31,000 visual questions originating from blind people who each took a picture using a mobile phone and recorded a spoken question about it, together with 10 crowdsourced answers per visual question. VizWiz differs from the many existing VQA datasets because (1) images are captured by blind photographers and so are often poor quality, (2) questions are spoken and so are more conversational, and (3) often visual questions cannot be answered. Evaluation of modern algorithms for answering visual questions and deciding if a visual question is answerable reveals that VizWiz is a challenging dataset. We introduce this dataset to encourage a larger community to develop more generalized algorithms that can assist blind people.

期刊:arXiv, 2018年5月10日

网址

http://www.zhuanzhi.ai/document/3a6116d0d7b4f29153b220e7de0745d3

12.Guided Attention for Large Scale Scene Text Verification(大规模场景文本验证的引导注意力)



作者:Dafang He,Yeqing Li,Alexander Gorban,Derrall Heath,Julian Ibarz,Qian Yu,Daniel Kifer,C. Lee Giles

机构:The Pennsylvania State University

摘要:Many tasks are related to determining if a particular text string exists in an image. In this work, we propose a new framework that learns this task in an end-to-end way. The framework takes an image and a text string as input and then outputs the probability of the text string being present in the image. This is the first end-to-end framework that learns such relationships between text and images in scene text area. The framework does not require explicit scene text detection or recognition and thus no bounding box annotations are needed for it. It is also the first work in scene text area that tackles suh a weakly labeled problem. Based on this framework, we developed a model called Guided Attention. Our designed model achieves much better results than several state-of-the-art scene text reading based solutions for a challenging Street View Business Matching task. The task tries to find correct business names for storefront images and the dataset we collected for it is substantially larger, and more challenging than existing scene text dataset. This new real-world task provides a new perspective for studying scene text related problems. We also demonstrate the uniqueness of our task via a comparison between our problem and a typical Visual Question Answering problem.

期刊:arXiv, 2018年4月24日

网址

http://www.zhuanzhi.ai/document/3062537c307bfe8eee23308ae822e746

-END-

原文发布于微信公众号 - 专知(Quan_Zhuanzhi)

原文发表时间:2018-06-08

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

编辑于

专知

880 篇文章344 人订阅

相关文章

来自专栏专知

【论文推荐】最新6篇视觉问答(VQA)相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【导读】专知内容组整理了最近六篇视觉问答(Visual Question Answering)相关文章,为大家进行介绍,欢迎查看! 1. Object-base...

53270
来自专栏目标检测和深度学习

全球最全计算机视觉资料(5:图像和视频标注)

14610
来自专栏CVer

人工智能 | 中国计算机学会推荐国际学术刊物/会议

关注CVer公众号的群体大多以学生为主,特别是研究生。相信在帮boss做事的时候,论文也是核心工作。Amusi平时爱推送一些论文速递,但这么多论文,怎么快速区分...

20710
来自专栏目标检测和深度学习

全球最全计算机视觉资料(6:问答|边缘检测|姿态估计|图像生成)

15720
来自专栏CreateAMind

awesome-self-supervised-learning papers

https://github.com/jason718/awesome-self-supervised-learning

38620
来自专栏AI研习社

126篇殿堂级深度学习论文分类整理 从入门到应用(上)

█ 如果你有非常大的决心从事深度学习,又不想在这一行打酱油,那么研读大牛论文将是不可避免的一步。而作为新人,你的第一个问题或许是:“论文那么多,从哪一篇读起?”...

37080
来自专栏专知

【专知荟萃23】深度强化学习RL知识资料全集(入门/进阶/论文/综述/代码/专家,附查看)

【AlphaGoZero核心技术】深度强化学习专知荟萃 【AlphaGoZero核心技术】深度强化学习专知荟萃 基础入门 进阶文章 Papers Papers ...

76690
来自专栏专知

【论文推荐】最新六篇视觉问答(VQA)相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

【导读】专知内容组整理了最近六篇视觉问答(Visual Question Answering)相关文章,为大家进行介绍,欢迎查看! 1. VizWiz Gran...

49750
来自专栏专知

【论文推荐】最新七篇视觉问答(VQA)相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

29120
来自专栏专知

【论文推荐】最新7篇视觉问答(VQA)相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【导读】专知内容组整理了最近七篇视觉问答(Visual Question Answering)相关文章,为大家进行介绍,欢迎查看! 1.VQA-E: Expla...

1.7K60

扫码关注云+社区

领取腾讯云代金券