专栏首页专知【论文推荐】最新6篇机器翻译相关论文—词性和语义标注任务、变分递归神经机器翻译、文学语料、神经后缀预测、重构模型

【论文推荐】最新6篇机器翻译相关论文—词性和语义标注任务、变分递归神经机器翻译、文学语料、神经后缀预测、重构模型

【导读】专知内容组整理了最近六篇机器翻译(Machine Translation)相关文章,为大家进行介绍,欢迎查看!

1. Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks(评估神经机器翻译的表示层在词性和语义标注任务下能力)



作者:Yonatan Belinkov,Lluís Màrquez,Hassan Sajjad,Nadir Durrani,Fahim Dalvi,James Glass

摘要:While neural machine translation (NMT) models provide improved translation quality in an elegant, end-to-end framework, it is less clear what they learn about language. Recent work has started evaluating the quality of vector representations learned by NMT models on morphological and syntactic tasks. In this paper, we investigate the representations learned at different layers of NMT encoders. We train NMT systems on parallel data and use the trained models to extract features for training a classifier on two tasks: part-of-speech and semantic tagging. We then measure the performance of the classifier as a proxy to the quality of the original NMT model for the given task. Our quantitative analysis yields interesting insights regarding representation learning in NMT models. For instance, we find that higher layers are better at learning semantics while lower layers tend to be better for part-of-speech tagging. We also observe little effect of the target language on source-side representations, especially with higher quality NMT models.

期刊:arXiv, 2018年1月24日

网址

http://www.zhuanzhi.ai/document/b9b8847279bdeffdd79e705d8924fb4a

2. Variational Recurrent Neural Machine Translation(变分递归神经机器翻译)



作者:Jinsong Su,Shan Wu,Deyi Xiong,Yaojie Lu,Xianpei Han,Biao Zhang

摘要:Partially inspired by successful applications of variational recurrent neural networks, we propose a novel variational recurrent neural machine translation (VRNMT) model in this paper. Different from the variational NMT, VRNMT introduces a series of latent random variables to model the translation procedure of a sentence in a generative way, instead of a single latent variable. Specifically, the latent random variables are included into the hidden states of the NMT decoder with elements from the variational autoencoder. In this way, these variables are recurrently generated, which enables them to further capture strong and complex dependencies among the output translations at different timesteps. In order to deal with the challenges in performing efficient posterior inference and large-scale training during the incorporation of latent variables, we build a neural posterior approximator, and equip it with a reparameterization technique to estimate the variational lower bound. Experiments on Chinese-English and English-German translation tasks demonstrate that the proposed model achieves significant improvements over both the conventional and variational NMT models.

期刊:arXiv, 2018年1月16日

网址

http://www.zhuanzhi.ai/document/134ec3e6bca0ee744054d5a7c3f3b01f

3. What Level of Quality can Neural Machine Translation Attain on Literary Text?(神经机器翻译能在文学语料上表现出什么样的水平?)



作者:Antonio Toral,Andy Way

摘要:Given the rise of a new approach to MT, Neural MT (NMT), and its promising performance on different text types, we assess the translation quality it can attain on what is perceived to be the greatest challenge for MT: literary text. Specifically, we target novels, arguably the most popular type of literary text. We build a literary-adapted NMT system for the English-to-Catalan translation direction and evaluate it against a system pertaining to the previous dominant paradigm in MT: statistical phrase-based MT (PBSMT). To this end, for the first time we train MT systems, both NMT and PBSMT, on large amounts of literary text (over 100 million words) and evaluate them on a set of twelve widely known novels spanning from the the 1920s to the present day. According to the BLEU automatic evaluation metric, NMT is significantly better than PBSMT (p < 0.01) on all the novels considered. Overall, NMT results in a 11% relative improvement (3 points absolute) over PBSMT. A complementary human evaluation on three of the books shows that between 17% and 34% of the translations, depending on the book, produced by NMT (versus 8% and 20% with PBSMT) are perceived by native speakers of the target language to be of equivalent quality to translations produced by a professional human translator.

期刊:arXiv, 2018年1月16日

网址

http://www.zhuanzhi.ai/document/20a7227320b44ac162c6ec60dac869db

4. Improved English to Russian Translation by Neural Suffix Prediction(利用神经后缀预测来增强英语向俄语的翻译)



作者:Kai Song,Yue Zhang,Min Zhang,Weihua Luo

摘要:Neural machine translation (NMT) suffers a performance deficiency when a limited vocabulary fails to cover the source or target side adequately, which happens frequently when dealing with morphologically rich languages. To address this problem, previous work focused on adjusting translation granularity or expanding the vocabulary size. However, morphological information is relatively under-considered in NMT architectures, which may further improve translation quality. We propose a novel method, which can not only reduce data sparsity but also model morphology through a simple but effective mechanism. By predicting the stem and suffix separately during decoding, our system achieves an improvement of up to 1.98 BLEU compared with previous work on English to Russian translation. Our method is orthogonal to different NMT architectures and stably gains improvements on various domains.

期刊:arXiv, 2018年1月11日

网址

http://www.zhuanzhi.ai/document/0a7eb9fede3215e57d9c25764d4ec440

5. Translating Pro-Drop Languages with Reconstruction Models(用重构模型来翻译Pro-Drop类型的语言)



作者:Longyue Wang,Zhaopeng Tu,Shuming Shi,Tong Zhang,Yvette Graham,Qun Liu

摘要:Pronouns are frequently omitted in pro-drop languages, such as Chinese, generally leading to significant challenges with respect to the production of complete translations. To date, very little attention has been paid to the dropped pronoun (DP) problem within neural machine translation (NMT). In this work, we propose a novel reconstruction-based approach to alleviating DP translation problems for NMT models. Firstly, DPs within all source sentences are automatically annotated with parallel information extracted from the bilingual training corpus. Next, the annotated source sentence is reconstructed from hidden representations in the NMT model. With auxiliary training objectives, in terms of reconstruction scores, the parameters associated with the NMT model are guided to produce enhanced hidden representations that are encouraged as much as possible to embed annotated DP information. Experimental results on both Chinese-English and Japanese-English dialogue translation tasks show that the proposed approach significantly and consistently improves translation performance over a strong NMT baseline, which is directly built on the training data annotated with DPs.

期刊:arXiv, 2018年1月10日

网址

http://www.zhuanzhi.ai/document/b7a486c97594594dc2f8a85de6de6fd7

6. MIZAN: A Large Persian-English Parallel Corpus(MIZAN:一个大型的波斯英语平行语料库)



作者:Omid Kashefi

摘要:One of the most major and essential tasks in natural language processing is machine translation that is now highly dependent upon multilingual parallel corpora. Through this paper, we introduce the biggest Persian-English parallel corpus with more than one million sentence pairs collected from masterpieces of literature. We also present acquisition process and statistics of the corpus, and experiment a base-line statistical machine translation system using the corpus.

期刊:arXiv, 2018年1月7日

网址

http://www.zhuanzhi.ai/document/d96d21e6ef0306adbdcc6b8b6282b87d

本文分享自微信公众号 - 专知(Quan_Zhuanzhi),作者:专知内容组(编)

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2018-01-26

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

    【导读】专知内容组整理了最近五篇信息抽取(Information Extraction)相关文章,为大家进行介绍,欢迎查看! 1.Joint Recogniti...

    WZEARW
  • 【论文推荐】最新五篇度量学习相关论文—无标签、三维姿态估计、主动度量学习、深度度量学习、层次度量学习与匹配

    【导读】专知内容组整理了最近五篇度量学习(Metric Learning )相关文章,为大家进行介绍,欢迎查看! 1.Mining on Manifolds: ...

    WZEARW
  • 【论文推荐】最新5篇推荐系统相关论文—文档向量矩阵分解、异构网络融合、树结构深度模型、深度强化学习、负二项矩阵分解

    【导读】专知内容组整理了最近五篇推荐系统(Recommender System)相关文章,为大家进行介绍,欢迎查看! 1. ParVecMF: A Paragr...

    WZEARW
  • Declarative Programming: Is It A Real Thing?

    由于合作方希望能以英文形式发布,故以后top的译文看时间而定,没时间就不再尝试翻译(而且本来水平也不咋地),仅保留原文于此。本次是一篇关于声明式编程的讨论文章,...

    汐楓
  • IP形象世界杯设定-WeFriends

    ? 腾讯ISUX isux.tencent.com 社交用户体验设计 ? 在2018年俄罗斯世界杯到来之际,我们策划了一项品牌宣传活动。将SNG集团旗下的6...

    腾讯ISUX
  • 分布式系统的一些阅读笔记

    distributed systems high level:scalability, availability, performance, latency a...

    哒呵呵
  • POJ 2209 The King(简单贪心)

    The King Time Limit: 2000MS Memory Limit: 65536K Total Submissions: 7499...

    Angel_Kitty
  • 袖珍分布式系统(一)

    本文是Distributed systems for fun and profit的第一部分,本文是阅读该文后的一些记录。

    zhuanxu
  • 【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

    【导读】专知内容组整理了最近五篇信息抽取(Information Extraction)相关文章,为大家进行介绍,欢迎查看! 1.Joint Recogniti...

    WZEARW
  • 【论文推荐】最新五篇度量学习相关论文—无标签、三维姿态估计、主动度量学习、深度度量学习、层次度量学习与匹配

    【导读】专知内容组整理了最近五篇度量学习(Metric Learning )相关文章,为大家进行介绍,欢迎查看! 1.Mining on Manifolds: ...

    WZEARW

扫码关注云+社区

领取腾讯云代金券