【论文推荐】最新5篇语音识别(ASR)相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

【导读】专知内容组整理了最近五篇语音识别(Automatic Speech Recognition, ASR)相关文章,为大家进行介绍,欢迎查看!

1. Audio Adversarial Examples: Targeted Attacks on Speech-to-Text(音频对抗样本:针对语音到文本的攻击)



作者:Nicholas Carlini,David Wagner

摘要:We construct targeted audio adversarial examples on automatic speech recognition. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (at a rate of up to 50 characters per second). We apply our iterative optimization-based attack to Mozilla's implementation DeepSpeech end-to-end, and show it has a 100% success rate. The feasibility of this attack introduce a new domain to study adversarial examples.

期刊:arXiv, 2018年1月6日

网址

http://www.zhuanzhi.ai/document/02b6a97d515f1ba5e9045bc90643ea6f

2. CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition(CommanderSong: 一种实用的对抗性语音识别系统)



作者:Xuejing Yuan,Yuxuan Chen,Yue Zhao,Yunhui Long,Xiaokang Liu,Kai Chen,Shengzhi Zhang,Heqing Huang,Xiaofeng Wang,Carl A. Gunter

摘要:ASR (automatic speech recognition) systems like Siri, Alexa, Google Voice or Cortana has become quite popular recently. One of the key techniques enabling the practical use of such systems in people's daily life is deep learning. Though deep learning in computer vision is known to be vulnerable to adversarial perturbations, little is known whether such perturbations are still valid on the practical speech recognition. In this paper, we not only demonstrate such attacks can happen in reality, but also show that the attacks can be systematically conducted. To minimize users' attention, we choose to embed the voice commands into a song, called CommandSong. In this way, the song carrying the command can spread through radio, TV or even any media player installed in the portable devices like smartphones, potentially impacting millions of users in long distance. In particular, we overcome two major challenges: minimizing the revision of a song in the process of embedding commands, and letting the CommandSong spread through the air without losing the voice "command". Our evaluation demonstrates that we can craft random songs to "carry" any commands and the modify is extremely difficult to be noticed. Specially, the physical attack that we play the CommandSongs over the air and record them can success with 94 percentage.

期刊:arXiv, 2018年1月25日

网址

http://www.zhuanzhi.ai/document/ec3cf2da6bad04326cbc0f4dfd7314f0

3. Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model(基于CTC的声学模型的多语种训练和跨语言适应方法)



作者:Sibo Tong,Philip N. Garner,Hervé Bourlard

摘要:Multilingual models for Automatic Speech Recognition (ASR) are attractive as they have been shown to benefit from more training data, and better lend themselves to adaptation to under-resourced languages. However, initialisation from monolingual context-dependent models leads to an explosion of context-dependent states. Connectionist Temporal Classification (CTC) is a potential solution to this as it performs well with monophone labels. We investigate multilingual CTC in the context of adaptation and regularisation techniques that have been shown to be beneficial in more conventional contexts. The multilingual model is trained to model a universal International Phonetic Alphabet (IPA)-based phone set using the CTC loss function. Learning Hidden Unit Contribution (LHUC) is investigated to perform language adaptive training. In addition, dropout during cross-lingual adaptation is also studied and tested in order to mitigate the overfitting problem. Experiments show that the performance of the universal phoneme-based CTC system can be improved by applying LHUC and it is extensible to new phonemes during cross-lingual adaptation. Updating all the parameters shows consistent improvement on limited data. Applying dropout during adaptation can further improve the system and achieve competitive performance with Deep Neural Network / Hidden Markov Model (DNN/HMM) systems on limited data.

期刊:arXiv, 2018年1月23日

网址

http://www.zhuanzhi.ai/document/c40ff80e8ec1b8044c32cad289037b8b

4. State-of-the-art Speech Recognition With Sequence-to-Sequence Models(采用序列到序列模型的前沿语音识别方法)



作者:Chung-Cheng Chiu,Tara N. Sainath,Yonghui Wu,Rohit Prabhavalkar,Patrick Nguyen,Zhifeng Chen,Anjuli Kannan,Ron J. Weiss,Kanishka Rao,Ekaterina Gonina,Navdeep Jaitly,Bo Li,Jan Chorowski,Michiel Bacchiani

摘要:Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural network. In our previous work, we have shown that such architectures are comparable to state-of-the-art ASR systems on dictation tasks, but it was not clear if such architectures would be practical for more challenging tasks such as voice search. In this work, we explore a variety of structural and optimization improvements to our LAS model which significantly improve performance. On the structural side, we show that word piece models can be used instead of graphemes. We introduce a multi-head attention architecture, which offers improvements over the commonly-used single-head attention. On the optimization side, we explore techniques such as synchronous training, scheduled sampling, label smoothing, and minimum word error rate optimization, which are all shown to improve accuracy. We present results with a unidirectional LSTM encoder for streaming recognition. On a 12,500 hour voice search task, we find that the proposed changes improve the WER of the LAS system from 9.2% to 5.6%, while the best conventional system achieve 6.7% WER. We also test both models on a dictation dataset, and our model provide 4.1% WER while the conventional system provides 5% WER.

期刊:arXiv, 2018年1月19日

网址

http://www.zhuanzhi.ai/document/9942c891541525c55c47a8f9d557c1ee

5. Spoken English Intelligibility Remediation with PocketSphinx Alignment and Feature Extraction Improves Substantially over the State of the Art(口语可理解性矫正)



作者:Yuan Gao,Brij Mohan Lal Srivastava,James Salsman

摘要:We use automatic speech recognition to assess spoken English learner pronunciation based on the authentic intelligibility of the learners' spoken responses determined from support vector machine (SVM) classifier or deep learning neural network model predictions of transcription correctness. Using numeric features produced by PocketSphinx alignment mode and many recognition passes searching for the substitution and deletion of each expected phoneme and insertion of unexpected phonemes in sequence, the SVM models achieve 82 percent agreement with the accuracy of Amazon Mechanical Turk crowdworker transcriptions, up from 75 percent reported by multiple independent researchers. Using such features with SVM classifier probability prediction models can help computer-aided pronunciation teaching (CAPT) systems provide intelligibility remediation.

期刊:arXiv, 2018年1月26日

网址

http://www.zhuanzhi.ai/document/65cfc71d13f1f5d4a5e3ad37898fa916

原文发布于微信公众号 - 专知(Quan_Zhuanzhi)

原文发表时间:2018-02-05

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏AI科技评论

动态 | NIPS 2017 今天开启议程,谷歌科学家竟然组团去了450人,还都不是去玩的!

AI 科技评论按:据说,别人去NIPS 2017是这样的: ? 而谷歌去NIPS 2017是这样的: ? 今天,人工智能领域本年度最后一个学术盛会、机器学习领域...

2353
来自专栏专知

【论文推荐】最新八篇知识图谱相关论文—神经信息检索、可解释推理网络、Zero-Shot、上下文、Attentive RNN

【导读】专知内容组今天为大家推出八篇知识图谱(Knowledge Graph)相关论文,欢迎查看!

3533
来自专栏专知

【重磅】深度学习顶会ICLR2018评审结果出炉,一文快速了解评审分析简报和评分最高的十篇论文

【导读】ICLR,全称为「International Conference on Learning Representations」(国际学习表征会议),201...

3435
来自专栏腾讯高校合作

【犀牛鸟学问】ACL2017论文报告会

近日,自然语言处理领域国际最权威的学术会议ACL 2017公布了录用论文。为了促进国内自然语言处理相关研究的发展以及研究者之间的交流,中国中文信息学会青年工作委...

3055
来自专栏专知

【论文推荐】最新7篇视觉问答(VQA)相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【导读】专知内容组整理了最近七篇视觉问答(Visual Question Answering)相关文章,为大家进行介绍,欢迎查看! 1.VQA-E: Expla...

1.6K6
来自专栏磐创AI技术团队的专栏

【一文打尽 ICLR 2018】9大演讲,DeepMind、谷歌最新干货抢鲜看

1213
来自专栏专知

【专知荟萃06】计算机视觉CV知识资料大全集(入门/进阶/论文/课程/会议/专家等)(附pdf下载)

【导读】主题荟萃知识是专知的核心功能之一,为用户提供AI领域系统性的知识学习服务。主题荟萃为用户提供全网关于该主题的精华(Awesome)知识资料收录整理,使得...

1.5K14
来自专栏专知

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

3303
来自专栏目标检测和深度学习

【一文打尽 ICLR 2018】9大演讲,DeepMind、谷歌最新干货抢鲜看

2174
来自专栏专知

【论文推荐】最新八篇推荐系统相关论文—可解释推荐、上下文感知推荐系统、异构知识库嵌入、深度强化学习、移动推荐系统

【导读】专知内容组既昨天推出八篇推荐系统相关论文之后,今天为大家又推出八篇推荐系统(Recommendation System)相关论文,欢迎查看!

5353

扫码关注云+社区

领取腾讯云代金券