前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >重磅纯干货 | 超级赞的语音识别/语音合成经典论文的路线图(1982-2018.5)

重磅纯干货 | 超级赞的语音识别/语音合成经典论文的路线图(1982-2018.5)

作者头像
用户7623498
发布2020-08-04 16:49:13
1.2K0
发布2020-08-04 16:49:13
举报

作者:zzw922cn

出处:github

网址:https://github.com/zzw922cn/awesome-speech-recognition-speech-synthesis-papers

翻译&编辑:九三山人

简介

自动语音识别已经研究了几十年,语音识别模型从HMM-GMM到今天的深层神经网络。通过这篇精彩的论文路线图来了解语音识别的历史是非常必要的。我将从传统的模型到现在流行的模型,不仅仅是声学模型或者ASR系统,还有很多有趣的语言模型。

经典论文打包下载,请在公众号对话框回复:20180822

论文列表

自动语音识别(1982-2018.5)

  • An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition(1982), S. E. LEVINSON et al. [pdf]
  • A Maximum Likelihood Approach to Continuous Speech Recognition(1983), LALIT R. BAHL et al. [pdf]
  • Heterogeneous Acoustic Measurements and Multiple Classifiers for Speech Recognition(1986), Andrew K. Halberstadt. [pdf]
  • Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition(1986), Lalit R. Bahi et al. [pdf]
  • A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition(1989), Lawrence R Rabiner. [pdf]
  • Phoneme recognition using time-delay neural networks(1989), Alexander H. Waibel et al. [pdf]
  • Speaker-independent phone recognition using hidden Markov models(1989), Kai-Fu Lee et al. [pdf]
  • Hidden Markov Models for Speech Recognition(1991), B. H. Juang et al. [pdf]
  • Connectionist Speech Recognition: A Hybrid Approach(1994), Herve Bourlard et al. [pdf]
  • A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)(1997), J.G. Fiscus. [pdf]
  • Speech recognition with weighted finite-state transducers(2001), M Mohri et al. [pdf]
  • Review of Tdnn (time Delay Neural Network) Architectures for Speech Recognition(2014), Masahide Sugiyamat et al. [pdf]
  • Framewise phoneme classification with bidirectional LSTM and other neural network architectures(2005), Alex Graves et al. [pdf]
  • Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks(2006), Alex Graves et al. [pdf]
  • The kaldi speech recognition toolkit(2011), Daniel Povey et al. [pdf]
  • Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition(2012), Ossama Abdel-Hamid et al. [pdf]
  • Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition(2012), George E. Dahl et al. [pdf]
  • Deep Neural Networks for Acoustic Modeling in Speech Recognition(2012), Geoffrey Hinton et al. [pdf]
  • Sequence Transduction with Recurrent Neural Networks(2012), Alex Graves et al. [pdf]
  • Deep convolutional neural networks for LVCSR(2013), Tara N. Sainath et al. [pdf]
  • Improving deep neural networks for LVCSR using rectified linear units and dropout(2013), George E. Dahl et al. [pdf]
  • Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training(2013), Yajie Miao et al. [pdf]
  • Improvements to deep convolutional neural networks for LVCSR(2013), Tara N. Sainath et al. [pdf]
  • Machine Learning Paradigms for Speech Recognition: An Overview(2013), Li Deng et al. [pdf]
  • Recent advances in deep learning for speech research at Microsoft(2013), Li Deng et al. [pdf]
  • Speech recognition with deep recurrent neural networks(2013), Alex Graves et al. [pdf]
  • Convolutional deep maxout networks for phone recognition(2014), László Tóth et al. [pdf]
  • Convolutional Neural Networks for Speech Recognition(2014), Ossama Abdel-Hamid et al. [pdf]
  • Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition(2014), László Tóth. [pdf]
  • Deep Speech: Scaling up end-to-end speech recognition(2014), Awni Y. Hannun et al. [pdf]
  • End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results(2014), Jan Chorowski et al. [pdf]
  • First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs(2014), Andrew L. Maas et al. [pdf]
  • Long short-term memory recurrent neural network architectures for large scale acoustic modeling(2014), Hasim Sak et al. [pdf]
  • Robust CNN-based speech recognition with Gabor filter kernels(2014), Shuo-Yiin Chang et al. [pdf]
  • Stochastic pooling maxout networks for low-resource speech recognition(2014), Meng Cai et al. [pdf]
  • Towards End-to-End Speech Recognition with Recurrent Neural Networks(2014), Alex Graves et al. [pdf]
  • A neural transducer(2015), N Jaitly et al. [pdf]
  • Attention-Based Models for Speech Recognition(2015), Jan Chorowski et al. [pdf]
  • Analysis of CNN-based speech recognition system using raw speech as input(2015), Dimitri Palaz et al. [pdf]
  • Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks(2015), Tara N. Sainath et al. [pdf]
  • Deep convolutional neural networks for acoustic modeling in low resource languages(2015), William Chan et al. [pdf]
  • Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition(2015), Chao Weng et al. [pdf]
  • EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding(2015), Y Miao et al. [pdf]
  • Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition(2015), Hasim Sak et al. [pdf]
  • Lexicon-Free Conversational Speech Recognition with Neural Networks(2015), Andrew L. Maas et al. [pdf]
  • Online Sequence Training of Recurrent Neural Networks with Connectionist Temporal Classification(2015), Kyuyeon Hwang et al. [pdf]
  • Advances in All-Neural Speech Recognition(2016), Geoffrey Zweig et al. [pdf]
  • Advances in Very Deep Convolutional Neural Networks for LVCSR(2016), Tom Sercu et al. [pdf]
  • End-to-end attention-based large vocabulary speech recognition(2016), Dzmitry Bahdanau et al. [pdf]
  • Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention(2016), Dong Yu et al. [pdf]
  • Deep Speech 2: End-to-End Speech Recognition in English and Mandarin(2016), Dario Amodei et al. [pdf]
  • End-to-end attention-based distant speech recognition with Highway LSTM(2016), Hassan Taherian. [pdf]
  • Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning(2016), Suyoun Kim et al. [pdf]
  • Listen, attend and spell: A neural network for large vocabulary conversational speech recognition(2016), William Chan et al. [pdf]
  • Latent Sequence Decompositions(2016), William Chan et al. [pdf]
  • Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks(2016), Tara N. Sainath et al. [pdf]
  • Recurrent Models for Auditory Attention in Multi-Microphone Distance Speech Recognition(2016), Suyoun Kim et al. [pdf]
  • Segmental Recurrent Neural Networks for End-to-End Speech Recognition(2016), Liang Lu et al. [pdf]
  • Towards better decoding and language model integration in sequence to sequence models(2016), Jan Chorowski et al. [pdf]
  • Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition(2016), Yanmin Qian et al. [pdf]
  • Very Deep Convolutional Networks for End-to-End Speech Recognition(2016), Yu Zhang et al. [pdf]
  • Very deep multilingual convolutional neural networks for LVCSR(2016), Tom Sercu et al. [pdf]
  • Wav2Letter: an End-to-End ConvNet-based Speech Recognition System(2016), Ronan Collobert et al. [pdf]
  • WaveNet: A Generative Model for Raw Audio(2016), Aäron van den Oord et al. [pdf]
  • Attentive Convolutional Neural Network based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech(2017), Michael Neumann et al. [pdf]
  • An enhanced automatic speech recognition system for Arabic(2017), Mohamed Amine Menacer et al. [pdf]
  • Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM(2017), Takaaki Hori et al. [pdf]
  • A network of deep neural networks for distant speech recognition(2017), Mirco Ravanelli et al. [pdf]
  • An online sequence-to-sequence model for noisy speech recognition(2017), Chung-Cheng Chiu et al. [pdf]
  • An Unsupervised Speaker Clustering Technique based on SOM and I-vectors for Speech Recognition Systems(2017), Hany Ahmed et al. [pdf]
  • Attention-Based End-to-End Speech Recognition in Mandarin(2017), C Shan et al. [pdf]
  • Building DNN acoustic models for large vocabulary speech recognition(2017), Andrew L. Maas et al. [pdf]
  • Direct Acoustics-to-Word Models for English Conversational Speech Recognition(2017), Kartik Audhkhasi et al. [pdf]
  • Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments(2017), Zixing Zhang et al. [pdf]
  • English Conversational Telephone Speech Recognition by Humans and Machines(2017), George Saon et al. [pdf]
  • ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA(2017), Song Han et al. [pdf]
  • Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition(2017), Chris Donahue et al. [pdf]
  • Deep LSTM for Large Vocabulary Continuous Speech Recognition(2017), Xu Tian et al. [pdf]
  • Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition(2017), Taesup Kim et al. [pdf]
  • Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling(2017), Hairong Liu et al. [pdf]
  • Improving the Performance of Online Neural Transducer Models(2017), Tara N. Sainath et al. [pdf]
  • Learning Filterbanks from Raw Speech for Phone Recognition(2017), Neil Zeghidour et al. [pdf]
  • Multichannel End-to-end Speech Recognition(2017), Tsubasa Ochiai et al. [pdf]
  • Multi-task Learning with CTC and Segmental CRF for Speech Recognition(2017), Liang Lu et al. [pdf]
  • Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition(2017), Tara N. Sainath et al. [pdf]
  • Multilingual Speech Recognition With A Single End-To-End Model(2017), Shubham Toshniwal et al. [pdf]
  • Optimizing expected word error rate via sampling for speech recognition(2017), Matt Shannon. [pdf]
  • Residual Convolutional CTC Networks for Automatic Speech Recognition(2017), Yisen Wang et al. [pdf]
  • Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition(2017), Jaeyoung Kim et al. [pdf]
  • Recurrent Models for Auditory Attention in Multi-Microphone Distance Speech Recognition(2017), Suyoun Kim et al. [pdf]
  • Reducing Bias in Production Speech Models(2017), Eric Battenberg et al. [pdf]
  • Robust Speech Recognition Using Generative Adversarial Networks(2017), Anuroop Sriram et al. [pdf]
  • State-of-the-art Speech Recognition With Sequence-to-Sequence Models(2017), Chung-Cheng Chiu et al. [pdf]
  • Towards Language-Universal End-to-End Speech Recognition(2017), Suyoun Kim et al. [pdf]
  • Accelerating recurrent neural network language model based online speech recognition system(2018), K Lee et al. [pdf] Speaker Verification Using Adapted Gaussian Mixture Models(2000), Douglas A.Reynolds et al. [pdf]
  • 演讲者识别(2000-2017)
  • A tutorial on text-independent speaker verification(2004), Frédéric Bimbot et al. [pdf]
  • Deep neural networks for small footprint text-dependent speaker verification(2014), E Variani et al. [pdf]
  • Deep Speaker Vectors for Semi Text-independent Speaker Verification(2015), Lantian Li et al. [pdf]
  • Deep Speaker: an End-to-End Neural Speaker Embedding System(2017), Chao Li et al. [pdf]
  • Deep Speaker Feature Learning for Text-independent Speaker Verification(2017), Lantian Li et al. [pdf]
  • Deep Speaker Verification: Do We Need End to End?(2017), Dong Wang et al. [pdf]
  • Speaker Diarization with LSTM(2017), Quan Wang et al. [pdf]
  • Text-Independent Speaker Verification Using 3D Convolutional Neural Networks(2017), Amirsina Torfi et al. [pdf]

语音合成(1993-2018)

  • Signal estimation from modified short-time Fourier transform(1993), Daniel W. Griffin et al. [pdf]
  • Text-to-speech synthesis(2009), Paul Taylor et al. [pdf]
  • A fast Griffin-Lim algorithm(2013), Nathanael Perraudin et al. [pdf]
  • First Step Towards End-to-End Parametric TTS Synthesis: Generating Spectral Parameters with Neural Attention(2016), Wenfu Wang et al. [pdf]
  • Recent Advances in Google Real-Time HMM-Driven Unit Selection Synthesizer(2016), Xavi Gonzalvo et al. [pdf]
  • SampleRNN: An Unconditional End-to-End Neural Audio Generation Model(2016), Soroush Mehri et al. [pdf]
  • WaveNet: A Generative Model for Raw Audio(2016), Aäron van den Oord et al. [pdf]
  • Char2Wav: End-to-end speech synthesis(2017), J Sotelo et al. [pdf]
  • Deep Voice: Real-time Neural Text-to-Speech(2017), Sercan O. Arik et al. [pdf]
  • Deep Voice 2: Multi-Speaker Neural Text-to-Speech(2017), Sercan Arik et al. [pdf]
  • Deep Voice 3: 2000-Speaker Neural Text-to-speech(2017), Wei Ping et al. [pdf]
  • Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions(2017), Jonathan Shen et al. [pdf]
  • Parallel WaveNet: Fast High-Fidelity Speech Synthesis(2017), Aaron van den Oord et al. [pdf]
  • Statistical Parametric Speech Synthesis Using Generative Adversarial Networks Under A Multi-task Learning Framework(2017), S Yang et al. [pdf]
  • Tacotron: Towards End-to-End Speech Synthesis(2017), Yuxuan Wang et al. [pdf]
  • Uncovering Latent Style Factors for Expressive Speech Synthesis(2017), Yuxuan Wang et al. [pdf]
  • VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop(2017), Yaniv Taigman et al. [pdf]
  • Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions(2017), Jonathan Shen et al. [pdf]
  • Neural Voice Cloning with a Few Samples(2018), Sercan O. Arık , Jitong Chen , 1 Kainan Peng , Wei Ping * et al. [pdf]

语言建模(1992-2016)

  • Class-Based n-gram Models of Natural Language(1992), Peter F. Brown et al. [pdf]
  • An empirical study of smoothing techniques for language modeling(1996), Stanley F. Chen et al. [pdf]
  • A Neural Probabilistic Language Model(2000), Yoshua Bengio et al. [pdf]
  • A new statistical approach to Chinese Pinyin input(2000), Zheng Chen et al. [pdf]
  • Discriminative n-gram language modeling(2007), Brian Roark et al. [pdf]
  • Neural Network Language Model for Chinese Pinyin Input Method Engine(2015), S Chen et al. [pdf]
  • Efficient Training and Evaluation of Recurrent Neural Network Language Models for Automatic Speech Recognition(2016), Xie Chen et al. [pdf]
  • Exploring the limits of language modeling(2016), R Jozefowicz et al. [pdf]
  • On the State of the Art of Evaluation in Neural Language Models(2016), G Melis et al. [pdf]

联系作者

如有任何问题,欢迎发邮件至:zzw922cn@gmail.com。谢谢!

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2018-08-29,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 决策智能与机器学习 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
语音识别
腾讯云语音识别(Automatic Speech Recognition,ASR)是将语音转化成文字的PaaS产品,为企业提供精准而极具性价比的识别服务。被微信、王者荣耀、腾讯视频等大量业务使用,适用于录音质检、会议实时转写、语音输入法等多个场景。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档