【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【导读】专知内容组整理了最近七篇自注意力机制(Self-attention)相关文章,为大家进行介绍,欢迎查看!

1. A Structured Self-attentive Sentence Embedding(一个结构化的自注意力的句子嵌入)


作者:Zhouhan Lin,Minwei Feng,Cicero Nogueira dos Santos,Mo Yu,Bing Xiang,Bowen Zhou,Yoshua Bengio

机构:Montreal Institute for Learning Algorithms,Universite de Montr ´ eal

摘要:This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding. We evaluate our model on 3 different tasks: author profiling, sentiment classification, and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks.

期刊:arXiv, 2018年3月9日

网址

http://www.zhuanzhi.ai/document/c5ceb3ae25d3a3be4848bd0182d210c4

2. Self-Attention with Relative Position Representations(基于相对位置表示的子注意力模型)


作者:Peter Shaw,Jakob Uszkoreit,Ashish Vaswani

机构:Google Brain

摘要:Relying entirely on an attention mechanism, the Transformer introduced by Vaswani et al. (2017) achieves state-of-the-art results for machine translation. In contrast to recurrent and convolutional neural networks, it does not explicitly model relative or absolute position information in its structure. Instead, it requires adding representations of absolute positions to its inputs. In this work we present an alternative approach, extending the self-attention mechanism to efficiently consider representations of the relative positions, or distances between sequence elements. On the WMT 2014 English-to-German and English-to-French translation tasks, this approach yields improvements of 1.3 BLEU and 0.3 BLEU over absolute position representations, respectively. Notably, we observe that combining relative and absolute position representations yields no further improvement in translation quality. We describe an efficient implementation of our method and cast it as an instance of relation-aware self-attention mechanisms that can generalize to arbitrary graph-labeled inputs.

期刊:arXiv, 2018年3月6日

网址

http://www.zhuanzhi.ai/document/b5f7daac1262e2172682b9146b2dc6d3

3. Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling(增强的自注意力网络:一种对序列建模的硬和软注意力的混合)


作者:Tao Shen,Tianyi Zhou,Guodong Long,Jing Jiang,Sen Wang,Chengqi Zhang

机构:University of Technology Sydney,University of Washington,Griffith University

摘要:Many natural language processing tasks solely rely on sparse dependencies between a few tokens in a sentence. Soft attention mechanisms show promising performance in modeling local/global dependencies by soft probabilities between every two tokens, but they are not effective and efficient when applied to long sentences. By contrast, hard attention mechanisms directly select a subset of tokens but are difficult and inefficient to train due to their combinatorial nature. In this paper, we integrate both soft and hard attention into one context fusion model, "reinforced self-attention (ReSA)", for the mutual benefit of each other. In ReSA, a hard attention trims a sequence for a soft self-attention to process, while the soft attention feeds reward signals back to facilitate the training of the hard one. For this purpose, we develop a novel hard attention called "reinforced sequence sampling (RSS)", selecting tokens in parallel and trained via policy gradient. Using two RSS modules, ReSA efficiently extracts the sparse dependencies between each pair of selected tokens. We finally propose an RNN/CNN-free sentence-encoding model, "reinforced self-attention network (ReSAN)", solely based on ReSA. It achieves state-of-the-art performance on both Stanford Natural Language Inference (SNLI) and Sentences Involving Compositional Knowledge (SICK) datasets.

期刊:arXiv, 2018年1月31日

网址

http://www.zhuanzhi.ai/document/e51e2566ca96c1a0e754df04a648eeb0

4. Distance-based Self-Attention Network for Natural Language Inference(基于距离的自注意力网络的自然语言推理)


作者:Jinbae Im,Sungzoon Cho

机构:Seoul National University

摘要:Attention mechanism has been used as an ancillary means to help RNN or CNN. However, the Transformer (Vaswani et al., 2017) recently recorded the state-of-the-art performance in machine translation with a dramatic reduction in training time by solely using attention. Motivated by the Transformer, Directional Self Attention Network (Shen et al., 2017), a fully attention-based sentence encoder, was proposed. It showed good performance with various data by using forward and backward directional information in a sentence. But in their study, not considered at all was the distance between words, an important feature when learning the local dependency to help understand the context of input text. We propose Distance-based Self-Attention Network, which considers the word distance by using a simple distance mask in order to model the local dependency without losing the ability of modeling global dependency which attention has inherent. Our model shows good performance with NLI data, and it records the new state-of-the-art result with SNLI data. Additionally, we show that our model has a strength in long sentences or documents.

期刊:arXiv, 2017年12月6日

网址

http://www.zhuanzhi.ai/document/78436b0cb7b3f288ed5b0a4dbe516896

5. Improving Visually Grounded Sentence Representations with Self-Attention(以自注意力的方式提高基于视觉的句子表达)


作者:Kang Min Yoo,Youhyun Shin,Sang-goo Lee

机构:Seoul National University

摘要:Sentence representation models trained only on language could potentially suffer from the grounding problem. Recent work has shown promising results in improving the qualities of sentence representations by jointly training them with associated image features. However, the grounding capability is limited due to distant connection between input sentences and image features by the design of the architecture. In order to further close the gap, we propose applying self-attention mechanism to the sentence encoder to deepen the grounding effect. Our results on transfer tasks show that self-attentive encoders are better for visual grounding, as they exploit specific words with strong visual associations.

期刊:arXiv, 2017年12月2日

网址

http://www.zhuanzhi.ai/document/e6830a90f7825e921218c1344c74224d

6. DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding(DiSAN:RNN/CNN-free语言理解的方向的双向自注意网络)


作者:Tao Shen,Tianyi Zhou,Guodong Long,Jing Jiang,Shirui Pan,Chengqi Zhang

机构:University of Technology Sydney,University of Washington

摘要:Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms have recently attracted enormous interest due to their highly parallelizable computation, significantly less training time, and flexibility in modeling dependencies. We propose a novel attention mechanism in which the attention between elements from input sequence(s) is directional and multi-dimensional (i.e., feature-wise). A light-weight neural net, "Directional Self-Attention Network (DiSAN)", is then proposed to learn sentence embedding, based solely on the proposed attention without any RNN/CNN structure. DiSAN is only composed of a directional self-attention with temporal order encoded, followed by a multi-dimensional attention that compresses the sequence into a vector representation. Despite its simple form, DiSAN outperforms complicated RNN models on both prediction quality and time efficiency. It achieves the best test accuracy among all sentence encoding methods and improves the most recent best result by 1.02% on the Stanford Natural Language Inference (SNLI) dataset, and shows state-of-the-art test accuracy on the Stanford Sentiment Treebank (SST), Multi-Genre natural language inference (MultiNLI), Sentences Involving Compositional Knowledge (SICK), Customer Review, MPQA, TREC question-type classification and Subjectivity (SUBJ) datasets.

期刊:arXiv, 2017年11月21日

网址

http://www.zhuanzhi.ai/document/ffba9961c35cae407f5390dca0a3c627

7. Deep Semantic Role Labeling with Self-Attention(基于子注意力机制的深度语义角色标注)


作者:Zhixing Tan,Mingxuan Wang,Jun Xie,Yidong Chen,Xiaodong Shi

机构:Xiamen University,Tencent Technology Co

摘要:Semantic Role Labeling (SRL) is believed to be a crucial step towards natural language understanding and has been widely studied. Recent years, end-to-end SRL with recurrent neural networks (RNN) has gained increasing attention. However, it remains a major challenge for RNNs to handle structural information and long range dependencies. In this paper, we present a simple and effective architecture for SRL which aims to address these problems. Our model is based on self-attention which can directly capture the relationships between two tokens regardless of their distance. Our single model achieves F$_1=83.4$ on the CoNLL-2005 shared task dataset and F$_1=82.7$ on the CoNLL-2012 shared task dataset, which outperforms the previous state-of-the-art results by $1.8$ and $1.0$ F$_1$ score respectively. Besides, our model is computationally efficient, and the parsing speed is 50K tokens per second on a single Titan X GPU.

期刊:arXiv, 2017年12月5日

网址

http://www.zhuanzhi.ai/document/dabd379090a1173822f3f0745ae90717

原文发布于微信公众号 - 专知(Quan_Zhuanzhi)

原文发表时间:2018-03-12

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏专知

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

【导读】专知内容组推出九篇自动问答(Question Answering)相关论文,欢迎查看!

3535
来自专栏新智元

【最全开工干货】深度学习书单、文献及数据集(共446项)

新年伊始,相信每个人已经制定好了自己2016年的计划。随着无人机和智能机器人在春晚亮相,想必许多人会对“人工智能”、“机器学习”,“深度学习”这些科技热词充满了...

3816
来自专栏专知

【论文推荐】最新六篇视频分类相关论文—教师学生网络、表观-关系、Charades-Ego、视觉数据合成、图蒸馏、细粒度视频分类

【导读】专知内容组为大家推出最新六篇视频分类(Video Classification)相关论文,欢迎查看!

1353
来自专栏专知

【专知荟萃06】计算机视觉CV知识资料大全集(入门/进阶/论文/课程/会议/专家等)(附pdf下载)

【导读】主题荟萃知识是专知的核心功能之一,为用户提供AI领域系统性的知识学习服务。主题荟萃为用户提供全网关于该主题的精华(Awesome)知识资料收录整理,使得...

1.5K14
来自专栏CVer

人工智能 | 中国计算机学会推荐国际学术刊物/会议

关注CVer公众号的群体大多以学生为主,特别是研究生。相信在帮boss做事的时候,论文也是核心工作。Amusi平时爱推送一些论文速递,但这么多论文,怎么快速区分...

1821
来自专栏专知

【ICLR 2018】Google 研究盘点,76篇论文抢先看

3022
来自专栏专知

【专知荟萃23】深度强化学习RL知识资料全集(入门/进阶/论文/综述/代码/专家,附查看)

【AlphaGoZero核心技术】深度强化学习专知荟萃 【AlphaGoZero核心技术】深度强化学习专知荟萃 基础入门 进阶文章 Papers Papers ...

7189
来自专栏专知

【论文推荐】最新八篇知识图谱相关论文—神经信息检索、可解释推理网络、Zero-Shot、上下文、Attentive RNN

【导读】专知内容组今天为大家推出八篇知识图谱(Knowledge Graph)相关论文,欢迎查看!

3513
来自专栏CVer

谷歌CVPR 2018最全总结:45篇论文,Ian Goodfellow GAN演讲PPT下载

谷歌在今年的CVPR上表现强势,有超过200名谷歌员工将在大会上展示论文或被邀请演讲,45篇论文被接收。在计算机视觉领域,生成对抗网络GAN无疑是最受关注的主题...

4763
来自专栏目标检测和深度学习

全球最全计算机视觉资料(5:图像和视频标注)

1381

扫码关注云+社区

领取腾讯云代金券