语音顶级会议Interspeech2018接受论文列表!

Interspeech2018

Interspeech 是国际语音通信协会(ISCA)组织的语音领域顶级学术会议,是全球最大的综合性语音信息处理领域的科技盛会。Interspeech会议鼓励语音领域的交叉研究,特别是快速发展的人工智能和机器学习技术在语音领域的研究和应用。Interspeech 2018将于今年9月初在印度海得拉巴召开。

Paper ID

Title

Authors

27

Binaural Speech Intelligibility Estimation Using Deep Neural Networks

Kazuhiro Kondo, Kazuya Taira and Yosuke Kobayashi

34

Real-Time Scoring of an Oral Reading Assessment on Mobile Devices

Jian Cheng

38

Conditional End-to-End Audio Transformations

Albert Haque, Michelle Guo and Prateek Verma

40

Speech recognition for medical conversations

Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu and Xuedong Zhang

41

Improved Supervised Locality Preserving Projection for I-vector Based Speaker Verification

Lanhua You, Wu Guo, Yan Song and Sheng Zhang

42

Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yorùbá Language Text

Iroro Orife

43

Frequency domain variants of velvet noise and their application to speech processing and synthesis

Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda and Toshio Irino

45

A novel normalization method for autocorrelation function for pitch detection and for speech activity detection

Qiguang Lin and Yiwen Shao

46

Dithered Quantization for Frequency-Domain Speech and Audio Coding

Tom Bäckström, Johannes Fischer and sneha das

47

Categorical vs Dimensional Perception of Italian Emotional Speech

Emilia Parada-Cabaleiro, Giovanni Costantini, Anton Batliner, Alice Baird and Björn Schuller

48

Cross-language perception of Mandarin lexical tones by Mongolian-speaking bilinguals in the Inner Mongolia Autonomous Region, China

Kimiko Tsukada and Yu rong

51

The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats

Björn Schuller, Stefan Steidl, Anton Batliner, Peter Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian Pokorny, Eva-Maria Rathner, Karin Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou

52

Acoustic and Textual Data Augmentation for Improved ASR of Code-Switching Speech

Emre Yilmaz, Henk van den Heuvel and David van Leeuwen

57

Investigating the Effect of Audio Duration on Dementia Detection using Acoustic Features

Jochen Weiner, Miguel Angrick, Srinivasan Umesh and Tanja Schultz

60

The Trajectory of Voice Onset Time with Vocal Aging

Chen Xuanda, Xiong Ziyu and Hu Jian

61

Voice Comparison and Rhythm: Behavioral Differences between Target and Non-target Comparisons

Moez Ajili, Jean-Francois Bonastre and Solange Rossato

62

Entity-Aware Language Model as an Unsupervised Reranker

Mohammad Sadegh Rasooli and Sarangarajan Parthasarathy

63

Effects of User Controlled Speech Rate on Intelligibility in Noisy Environments

John Novak and Robert Kenyon

65

The ‘West Yorkshire Regional English Database’: Investigations into the generalizability of reference populations for forensic speaker comparison casework

Erica Gold, Sula Ross and Kate Earnshaw

67

Articulatory Features for ASR of Pathological Speech

Emre Yilmaz, Vikramjit Mitra, Chris Bartels and Horacio Franco

68

Vowel space as a tool to evaluate articulation problems

Rob van Son, Catherine Middag and Kris Demuynck

69

Performance Analysis of the 2017 NIST Language Recognition Evaluation

Seyed Omid Sadjadi, Timothee Kheyrkhah, Craig Greenberg, Douglas Reynolds, Elliot Singer, Lisa Mason and Jaime Hernandez-Cordero

70

Gated Convolutional Neural Network for Sentence Matching

Peixin Chen, Wu Guo, Zhi Chen, Jian Sun and Lanhua You

73

COSMO SylPhon: a model to assess phonological learning

Jean-Luc Schwartz

78

Active Memory Networks for Language Modeling

Oscar Chen, Anton Ragni, Mark Gales and Xie Chen

79

Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models

Naoyuki Kanda, Yusuke Fujita and Kenji Nagamatsu

83

Deep Speech Denoising with Vector Space Projections

Jeffrey Hetherly, Paul Gamble, Maria Alejandra Barrios, Cory Stephenson and Karl Ni

84

What to Expect from Expected Kneser-Ney Smoothing

Michael Levit, Sarangarajan Parthasarathy and Shuangyu Chang

91

Emotional Prosody Perception in Mandarin-speaking Congenital Amusics

Yixin Zhang, Tianzhu Geng and Jinsong Zhang

92

Analysis of Length Normalization in End-to-End Speaker Verification System

Weicheng Cai, Jinkun Chen and Ming Li

97

Overview of the 2018 Spoken CALL Shared Task

Claudia Baur, Andrew Caines, Cathy Chua, Johanna Gerlach, Mengjie Qian, Manny Rayner, Martin Russell, Helmer Strik and Xizi Wei

990

Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks

Yun Wang, Juncheng Li and Florian Metze

991

Prediction of Aesthetic Elements in Karnatic Music: A Machine Learning Approach

Ragesh Rajan M, Ashwin Vijayakumar and Deepu Vijayasenan

993

Attentive Statistics Pooling for Deep Speaker Embedding

Koji Okabe, Takafumi Koshinaka and Koichi Shinoda

995

UltraFit: A speaker-friendly headset for ultrasound recordings in speech sciences

Lorenzo Spreafico, Michael Pucher and Anna Matosova

996

Bags in Bag: Generating Context-Aware Bags for Tracking Emotions from Speech

Jing Han, Zixing Zhang, Maximilian Schmitt, Zhao Ren, Fabien Ringeval and Björn Schuller

999

Articulatory-to-speech conversion using bi-directional long short-term memory

Fumiaki Taguchi and Tokihiko Kaburagi

1000

The CSU-K Rule-Based System for the 2nd Edition Spoken CALL Shared Task

Kay Berkling, Cem Philipp Freimoser, Mario Kunstek and Jülg Dominik

1007

Follow-up Question Generation using Pattern-based Seq2seq with a Small Corpus for Interview Coaching

Ming-Hsiang Su, Chung-Hsien Wu, Kun-Yi Huang, Qian-Bei Hong and Huai-Hung Huang

1010

Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search

Yougen Yuan, Cheung-Chi Leung, lei xie, Hongjie Chen, Bin Ma and Haizhou Li

1013

Capsule Networks for Low Resource Spoken Language Understanding

Vincent Renkens and Hugo Van hamme

1015

Learning Discriminative Features for Speaker Identification and Verification

Sarthak Yadav and Atul Rai

1016

LSTM based Attentive Fusion of Spectral and Prosodic Information for Keyword Spotting in Hindi Language

Laxmi Pandey and Karan Nathwani

1018

Detection of glottal closure instants in degraded speech using single frequency filtering analysis

Gunnam Aneeja, Sudarsana Reddy Kadiri and Bayya Yegnanarayana

1019

Annotator Trustability-based Cooperative Learning Solutions for Intelligent Audio Analysis

Simone Hantke, Christoph Stemp and Björn Schuller

1020

Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement

Shuai Nie, Shan Liang, Bin Liu, Yaping Zhang, Wenju Liu and Jianhua Tao

1021

Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR

Yerbolat Khassanov and Eng Siong Chng

1023

MTGAN: Speaker Verification through Multitasking Triplet Generative Adversarial Networks

Wenhao Ding and Liang HE

1024

Effective acoustic cue learning is not just statistical, it is discriminative

Jessie S. Nixon

1025

Compression of End-to-End Models

Ruoming Pang, Tara Sainath, Rohit Prabhavalkar, Suyog Gupta, Yonghui Wu, Shuyuan Zhang and Chung-Cheng Chiu

1026

Postfiltering with Complex Spectral Correlations for Speech and Audio Coding

sneha das and Tom Bäckström

1027

Postfiltering Using Log-Magnitude Spectrum for Speech and Audio Coding

sneha das and Tom Bäckström

1030

Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition

Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su and Dong Yu

1032

Discriminating between nasals and approximants in English language using zero time windowing

RaviShankar Prasad, Sudarsana Reddy Kadiri, Suryakanth V Gangashetty and Bayya Yegnanarayana

1034

Scalable Factorized Hierarchical Variational Autoencoder Training

Wei-Ning Hsu and James Glass

1035

Contextual Slot Carryover for Disparate Schemas

Chetan Naik, Arpit Gupta, Hancheng Ge, Mathias Lambert and Ruhi Sarikaya

1037

Stream Attention for Distributed Multi-Microphone Speech Recognition

Xiaofei Wang, Ruizhi Li and Hynek Hermansky

1038

Articulatory consequences of vocal effort elicitation method

Elisabet Eir Cortes, Marcin Wlodarczak and Juraj Šimko

1039

Cross-Lingual Multi-Task Neural Architecture for Spoken Language Understanding

Yujiang Li, Xuemin Zhao, Weiqun Xu and Yonghong Yan

1042

Spoofing Detection Using Adaptive Weighting Framework and Clustering Analysis

Yuanjun Zhao, Roberto Togneri and Victor Sreeram

1043

Designing a Pneumatic Bionic Voice Prosthesis - Statistical Approach for Source Excitation Generation

Farzaneh Ahmadi and Tomoki Toda

1044

Training Utterance-level Embedding Networks for Speaker Identification and Verification

Heewoong Park, Sukhyun Cho, Kyubyong Park, Namju Kim and Jonghun Park

1046

Bone-Conduction Sensor Assisted Noise Estimation for Improved Speech Enhancement

Ching-Hua Lee, Bhaskar D. Rao and Harinath Garudadri

1047

Comparison of Syllabification Algorithms and Training Strategies for Robust Word Count Estimation across Different Languages and Recording Conditions

Okko Räsänen, Seshadri Shreyas and Marisa Casillas

1049

Acoustic Modeling with DFSMN-CTC and Joint CTC-CE Learning

ShiLiang Zhang and Ming Lei

1054

Towards a better characterization of Parkinsonian speech: a multidimensional acoustic study

Veronique Delvaux, kathy Huet, Myriam Piccaluga, Sophie Van Malderen and Bernard Harmegnies

1055

Low-Latency Neural Speech Translation

Jan Niehues, Ngoc-Quan Pham, Thanh-Le Ha, Matthias Sperber and Alex Waibel

1057

Structured Word Embedding for Low Memory Neural Network Language Model

Kaiyu Shi and Kai Yu

1058

An End-to-End Text-Independent Speaker Identification System on Short Utterances

Ruifang Ji, Xinyuan Cai and Xu Bo

1059

Dysarthric speech classification using glottal features computed from non-words, words and sentences

Narendra N P and Paavo Alku

1060

Length contrast and covarying features: Whistled speech as a case study

Rachid Ridouane, Giuseppina Turco and Julien Meyer

1062

On the Usefulness of the Speech Phase Spectrum for Pitch Extraction

Erfan Loweimi, Jon Barker and Thomas Hain

1063

Semi-supervised Cross-domain Visual Feature Learning for Audio-Visual Broadcast Speech Transcription

Rongfeng Su, Xunying Liu and Lan Wang

1065

Regional variation of /r/ in Swiss German dialects

Adrian Leemann, Stephan Schmid, Dieter Studer-Joho and Marie-José Kolly

1070

i-Vectors in Language Modeling: An Efficient Way of Domain Adaptation for Feed-Forward Models

Karel Beneš, Santosh Kesiraju and Lukáš Burget

1074

Structural effects on properties of consonantal gestures in Tashlhiyt

Anne Hermes, Doris Mücke, Bastian Auris and Rachid Ridouane

1076

General Utterance-Level Feature Extraction for Classifying Crying Sounds, Atypical & Self-Assessed Affect and Heart Beats

Gábor Gosztolya, Tamás Grósz and László Tóth

1078

Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces

László Tóth, Gábor Gosztolya, Tamás Grósz, Alexandra Markó and Tamás Gábor Csapó

1079

Identifying Schizophrenia Based on Temporal Parameters in Spontaneous Speech

Gábor Gosztolya, Anita Bagi, Szilvia Szalóki, István Szendi and Ildikó Hoffmann

1080

Implementation of Respiration in Articulatory Synthesis Using a Pressure-Volume Lung Model

Keisuke Tanihara, Shogo Yonekura and Yasuo Kuniyoshi

1081

Exploiting Speaker and Phonetic Diversity of Mismatched Language Resources for Unsupervised Subword Modeling

Siyuan Feng and Tan Lee

1085

Automatic Speech Recognition System Development in the "Wild"

Anton Ragni and Mark Gales

1086

Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin

Linhao Dong, Shiyu Zhou, Wei Chen and Bo Xu

1087

A deep learning approach to assessing non-native pronunciation of English using phone distances

Konstantinos Kyriakopoulos, Kate Knill and Mark Gales

1088

The Conversation Continues: The Effect of Lyrics and Music Complexity of Background Music on Spoken-Word Recognition

Odette Scharenborg and Martha Larson

1089

Acoustic Modeling with Densely Connected Residual Network for Multichannel Speech Recognition

Jian Tang, Yan Song, Lirong Dai and Ian McLoughlin

1093

The Perception and Analysis of the Likeability and Human Likeness of Synthesized Speech

Alice Baird, Emilia Parada-Cabaleiro, Simone Hantke, Felix Burkhardt, Nicholas Cummins and Björn Schuller

1096

Punctuation Prediction Model for Conversational Speech

Piotr Żelasko, Piotr Szymański, Jan Mizgajski, Adrian Szymczak, Yishay Carmiel and Najim Dehak

1097

Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition

Wei-Ning Hsu, Hao Tang and James Glass

1098

Detecting Packet-Loss Concealment Using Formant Features and Decision Tree Learning

Gabriel Mittag and Sebastian Möller

1099

The Role of Cognate Words, POS Tags, and Entrainment in Code-Switching

Victor Soto, Nishi Cestero and Julia Hirschberg

1100

Play Duration based User-Entity Affinity Modeling in Spoken Dialog System

Bo Xiao, Nicholas Monath, Shankar Ananthakrishnan and Abishek Ravi

1102

Analysis of Complementary Information Sources in the Speaker Embeddings Framework

Mahesh Kumar Nandwana, Mitchell McLaren, Diego Castan, Julien van Hout and Aaron Lawson

1103

Double Joint Bayesian Modeling of DNN Local I-Vector for Text Dependent Speaker Verification with Random Digit Strings

Ziqiang Shi, Huibin Lin, Liu Liu and Rujie Liu

1105

Estimation of the Vocal Tract Length of Vowel Sounds based on the Frequency of the Significant Spectral Valley

TV Ananthapadmanabha and Ramakrishnan AngaraiGanesan

1107

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

Shiyu Zhou, Dong Linhao, Shuang Xu and Bo Xu

1108

Tongue Segmentation with Geometrically Constrained Snake Model

Zhihua Su, Jianguo Wei, Qiang Fang, Jianrong Wang and Kiyoshi Honda

1110

L2-ARCTIC: a non-native English speech corpus

Guanlong Zhao, Sinem Sonsaat, Alif Silpachai, Ivana Lucic, Evgeny Chukharev-Hudilainen, John Levis and Ricardo Gutierrez-Osuna

1111

Improving Language Modeling with an Adversarial Critic for Automatic Speech Recognition

Yike Zhang, pengyuan zhang and Yonghong Yan

1113

Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder

Kei Akuzawa, Yusuke Iwasawa and Yutaka Matsuo

1114

A Deep Neural Network Based Harmonic Noise Model for Speech Enhancement

Zhiheng Ouyang, Hongjiang Yu, Wei-Ping Zhu and Benoit Champagne

1115

A comparison of input types to a deep neural network-based forced aligner

Matthew C. Kelley and Benjamin V. Tucker

1120

Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection

Shao-Yen Tseng, Juncheng Li, Yun Wang, Florian Metze, Joseph Szurley and Samarjit Das

1121

Voice Conversion with Conditional SampleRNN

Cong Zhou, Michael Horgan, Vivek Kumar, Cristina Vasco and Dan Darcy

1122

Contextual Language Model Adaptation for Conversational Agents

Anirudh Raju, Behnam Hedayatnia, Linda Liu, Ankur Gandhe, Chandra Khatri, Angeliki Metallinou, Anu Venkatesh and Ariya Rastrow

1124

Improved ASR for under-resourced languages through Multi-task Learning with Acoustic Landmarks

Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson and Deming Chen

1125

Self-similarity matrix based intelligibility assessment of cleft lip and palate speech

Sishir Kalita, S R Mahadeva Prasanna and Samarendra Dandapat

1126

Formant measures of vowels adjacent to alveolar and retroflex consonants in Arrernte: stressed and unstressed position

Marija Tabain, Richard Beare and Andrew Butcher

1128

Linear Prediction Residual based Short-term Cepstral Features for Replay Attacks Detection

Madhusudan Singh and Debadatta Pati

1130

Dialect-geographical Acoustic-Tonetics: five disyllabic tone sandhi patterns in cognate words from the Wu dialects of Zhèjiāng province

Phil Rose

1131

A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder

Berrak Sisman, Mingyang Zhang and Haizhou Li

1132

EMOTION RECOGNITION FROM HUMAN SPEECH USING TEMPORAL INFORMATION AND DEEP LEARNING

John Kim and Rif A. Saurous

1134

Bidirectional Long-Short Term Memory Network-based Estimation of Reliable Spectral Component Locations

Aaron Nicolson and Kuldip K. Paliwal

1135

Joint Noise and Reverberation Adaptive Learning for Robust Speaker DOA Estimation with An Acoustic Vector Sensor

Disong Wang and Yuexian Zou

1138

Multi-modal attention mechanisms in LSTM and its application to acoustic scene classification

Zhang Teng, Kailai Zhang and Ji Wu

1139

Rapid Collection of Spontaneous Speech Corpora using Telephonic Community Forums

Agha Ali Raza, Awais Athar, Shan Randhawa, Zain Tariq, Muhammad Bilal Saleem, Haris Bin Zia, Umar Saif and Roni Rosenfeld

1140

Monoaural Audio Source Separation using Variational Autoencoders

Laxmi Pandey, Anurendra Kumar and Vinay Namboodiri

1143

Deep learning techniques for koala activity detection

Ivan Himawan, Michael Towsey, Bradley Law and Paul Roe

1147

Glottal Closure Instant Detection from Speech Signal Using Voting Classifier and Recursive Feature Elimination

Jindrich Matousek and Daniel Tihelka

1149

User Information Augmented Semantic Frame Parsing using Progressive Neural Networks

Yilin Shen, Xiangyu Zeng, Yu Wang and Hongxia Jin

1150

A Shifted Delta Coefficient Objective for Monaural Speech Separation using Multi-task Learning

Chenglin Xu, Wei Rao, Eng Siong Chng and Haizhou Li

1151

Joint Learning using Denoising Variational Autoencoders for Voice Activity Detection

Youngmoon Jung, Younggwan Kim, Yeunju Choi and Hoirin Kim

1152

Temporal transformer networks for acoustic scene classification

Zhang Teng, Kailai Zhang and Ji Wu

1153

State Gradients for RNN Memory Analysis

Lyan Verwimp, Hugo Van hamme, Vincent Renkens and Patrick Wambacq

1154

Waveform-Based Speaker Representations for Speech Synthesis

Moquan Wan, Gilles Degottex and Mark Gales

1156

Leveraging Second-Order Log-Linear model for improved deep learning based ASR performance

Ankit Raj, Shakti Rath and Jithendra Vepa

1158

Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification

Yingke Zhu, Tom Ko, David Snyder, Brian Mak and Dan Povey

1159

Word Emphasis Prediction for Expressive Text to Speech

Yosi Mass, Slava Shechtman, Moran Mordechay, Ron Hoory, Oren Sar Shalom, Guy Lev and David Konopnicki

1160

Forward-Backward Attention Decoder

Masato Mimura, Shinsuke Sakai and Tatsuya Kawahara

1162

Active Learning for LF-MMI Trained Neural Networks in ASR

Yanhua Long, Hong Ye, Yijie Li and Jiaen Liang

1165

Using Deep Neural Networks for Identification of Slavic Languages from Acoustic Signal

Lukas Mateju, Petr Cerva, Jindrich Zdansky and Radek Safarik

1171

Homophone Identification and Merging for Code-switched Speech Recognition

Brij Mohan Lal Srivastava and Sunayana Sitaram

1173

Improved Epoch Extraction from Telephonic Speech using Chebfun and Zero Frequency Filtering

Ganga Gowri B, Soman K.P and Govind D

1174

Using pupillometry to measure the cognitive load of synthetic speech

Avashna Govender and Simon King

1176

Resyllabification in Indian Languages and its Implications in Text-to-speech Systems

Mahesh M, Jeena JPrakash and Hema Murthy

1178

Code-switching in Indic Speech Synthesisers

Anju Leela Thomas, Anusha Prakash, Arun Baby and Hema Murthy

1182

Improving Cross-Lingual Knowledge Transferability Using Multilingual TDNN-BLSTM with Language-Dependent Pre-Final Layer

Siyuan Feng and Tan Lee

1185

GlobalTIMIT: Acoustic-Phonetic Datasets for the World’s Languages

Nattanun Chanchaochai, Christopher Cieri, Japhet Debrah, Hongwei Ding, Yue Jiang, Sishi Liao, Mark Liberman, Jonathan Wright, Jiahong Yuan, Juhong Zhan and Yuqing Zhan

1188

Transcription correction for Indian languages using acoustic signatures

Jeena JPrakash, Golda Brunet Rajan and Hema Murthy

1190

WaveNet Vocoder with Limited Training Data for Voice Conversion

Li-Juan Liu, Zhen-Hua Ling, Yuan Jiang, Ming Zhou and Li-Rong Dai

1198

Learning and Modeling Unit Embeddings for Improving HMM-based Unit Selection Speech Synthesis

Xiao Zhou, Zhen-Hua Ling, Zhi-Ping Zhou and Li-Rong Dai

1199

Measuring the cognitive load of synthetic speech using a dual task paradigm

Avashna Govender and Simon King

1202

Phoneme-to-Articulatory mapping using bidirectional gated RNN

Théo Biasutto--Lervat and Slim Ouni

1203

Information Bottleneck based Percussion Instrument Diarization System for Taniavartanam Segments of Carnatic Music Concerts

Nauman Dawalatabad, Jom Kuriakose, Chandra Sekhar Chellu and Hema Murthy

1204

Compact Feedforward Sequential Memory Networks for Small-footprint Keyword Spotting

Mengzhe Chen, ShiLiang Zhang, Ming Lei, Yong Liu, Haitao Yao and Jie Gao

1205

Deep Extractor Network for Target Speaker Recovery From Single Channel Speech Mixtures

Jun Wang, Jie Chen, Dan Su, Lianwu Chen, Meng Yu, Yanmin Qian and Dong Yu

1209

Triplet loss based cosine similarity metric learning for text-independent speaker recognition

Sergey Novoselov, Vadim Shchemelinin, Andrey Shulipa, Alexandr Kozlov and Ivan Kremnev

1210

Collapsed speech segment detection and suppression for WaveNet vocoder

YICHIAO WU, Kazuhiro Kobayashi, Tomoki Hayashi, Patrick Lumban Tobing and Tomoki Toda

1211

Data augmentation improves recognition of foreign accented speech

Takashi Fukuda, Raul Fernandez, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Alexander Sorin and Gakuto Kurata

1212

Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition

Eugen Beck, Mirko Hannemann, Patrick Dötsch, Ralf Schlüter and Hermann Ney

1214

Exploration of Local Speaking Rate Variations in Mandarin Read Speech

Guan-Ting Liou, Chen-Yu CHIANG, Yih-Ru Wang and Sin-Horng Chen

1222

An Active Feature Transformation Method For Attitude Recognition of Video Bloggers

Fasih Haider, Fahim A. Salim, Owen Conlan and Saturnino Luz

1223

A New Framework for Supervised Speech Enhancement in the Time Domain

Ashutosh Pandey and Deliang Wang

1224

Singing voice phoneme segmentation by hierarchically inferring syllable and phoneme onset positions

Rong Gong and Xavier Serra

1225

Vowels and Diphthongs in Hangzhou Wu Chinese Dialect

Yang Yue and Fang Hu

1226

Speaker Embedding Extraction with Phonetic Information

Yi Liu, Liang He, Jia Liu and Michael T. Johnson

1227

Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects

Hieu-Thi Luong, Xin Wang, Junichi Yamagishi and Nobuyuki Nishizawa

1230

Time-regularized linear prediction for noise-robust extraction of the spectral envelope of speech

Manu Airaksinen, Lauri Juvela, Okko Räsänen and Paavo Alku

1232

S4D: Speaker Diarization Toolkit in Python

Pierre-Alexandre Broux, Florent Desnous, Anthony Larcher, Simon Petitrenaud, Jean Carrive and Sylvain Meignier

1233

Age-related effects on sensorimotor control of speech production

Anne Hermes, Jane Mertens and Doris Mücke

1234

Single-channel Speech Dereverberation via Generative Adversarial Training

Chenxing Li, Tieqiang Wang, Shuang Xu and Bo Xu

1237

Biophysically-inspired features improve the generalizability of neural network-based speech enhancement systems

Deepak Baby and Sarah Verhulst

1238

Deep Learning in Paralinguistic Recognition Tasks: Are Hand-crafted Features Still Relevant?

Johannes Wagner, Dominik Schiller, Andreas Seiderer and Elisabeth André

1239

Naturalness Improvement Algorithm for Reconstructed Glossectomy Patient's Speech Using Spectral Differential Modification in Voice Conversion

Hiroki Murakami, Sunao Hara, Masanobu Abe, Masaaki Sato and Shogo Minagi

1240

On Learning to Identify Genders from Raw Speech Signal using CNNs

Selen Hande Kabil, Hannah Muckenhirn and Mathew Magimai Doss

1241

Neural Language Codes for Multilingual Acoustic Models

Markus Müller, Sebastian Stüker and Alex Waibel

1242

An Attention Pooling based Representation Learning Method for Speech Emotion Recognition

Pengcheng Li, Yan Song, Ian McLoughlin, Wu Guo and Lirong Dai

1243

Unsupervised Temporal Feature Learning Based on Sparse Coding Embedded BoAW for Acoustic Event Recognition

Liwen Zhang

1244

Learning to adapt: a meta-learning approach for speaker adaptation

Ondrej Klejch, Joachim Fainberg, Peter Bell and Steve Renals

1245

Weighting Pitch Contour and Loudness Contour in Mandarin Tone Perception in Cochlear Implant Listeners

Qinglin Meng, Nengheng Zheng, Ambika Prasad Mishra, Jacinta Dan Luo and Jan W. H. Schnupp

1246

Co-whitening of i-vectors for short and long duration speaker verification

Longting Xu, Kong Aik Lee, Haizhou Li and Zhen Yang

1247

Training Augmentation using Adversarial Examples for Robust Speech Recognition

Sining Sun, Ching-Feng Yeh, Mari Ostendorf, Mei-Yuh Hwang and Lei Xie

1248

Multiple Concurrent Sound Source Tracking Based on Observation-Guided Adaptive Particle Filter

Hong Liu, haipeng lan, Bing Yang and Cheng Pang

1250

Data independent sequence augmentation method for acoustic scene classification

Zhang Teng, Kailai Zhang and Ji Wu

1251

Pitch-Adaptive Front-end Feature for Hypernasality Detection

Akhilesh Dubey, S R Mahadeva Prasanna and Samarendra Dandapat

1252

ZCU-NTIS Speaker Diarization System for the DIHARD 2018 Challenge

Zbynek Zajic, Marie Kunesova, Jan Zelinka and Marek Hrúz

1254

A first investigation of the timing of turn-taking in Ruuli

Tuarik Buanzur, Margaret Zellers, Saudah Namyalo and Alena Witzlack-Makarevich

1256

Exploring temporal reduction in dialectal Spanish: a large-scale study of lenition of voiced stops and coda-s

Ioana Vasilescu, Nidia Hernandez, Bianca Vieru and Lori Lamel

1258

Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors

Kanru Hua

1259

A Novel Approach for Effective Recognition of the Code-Switched Data on Monolingual Language Model

Sreeram Ganji and Rohit Sinha

1262

Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline

Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu and Shinji Watanabe

1264

Perceptual and automatic evaluations of the intelligibility of speech degraded by noise induced hearing loss simulation

Imed Laaridh, Julien Tardieu, Cynthia Magnen, Pascal Gaillard, Jérôme Farinas and Julien Pinquier

1265

Transfer Learning based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis

Ruibo Fu, Jianhua Tao, Yibin Zheng and Zhengqi Wen

1266

Automatic Evaluation of Speech Intelligibility based on i-vectors in the context of Head and Neck Cancers

Imed Laaridh, Corinne Fredouille, Alain Ghio, muriel lalain and Virginie Woisard

1267

Automatic Pronunciation Evaluation of Singing

Chitralekha Gupta, Haizhou Li and Ye Wang

1269

Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network

Weipeng He, Petr Motlicek and Jean-Marc Odobez

1270

Paired Phone-Posteriors Approach to ESL Pronunciation Quality Assessment

Yujia Xiao, Frank Soong and Wenping Hu

1271

Phoneme Resistance and Phoneme Confusion in Noise: Impact of Dyslexia

Noelia Do Carmo Blanco, Julien Meyer, Michel Hoen and Fanny Meunier

1272

Improving Sparse Representations in Exemplar-Based Voice Conversion with a Phoneme-Selective Objective Function

Shaojin Ding, Guanlong Zhao, Christopher Liberatore and Ricardo Gutierrez-Osuna

1280

A Generalization of PLDA for Joint Modeling of Speaker Identity and Multiple Nuisance Conditions

Luciana Ferrer and Mitchell McLaren

1281

Detection of Glottal Closure Instants from Speech Signals: A Convolutional Neural Network Based Method

Shuai Yang, Zhiyong Wu, Binbin Shen and Helen Meng

1283

Topic and Keyword Identification for Low-resourced Speech Using Cross-Language Transfer Learning

Wenda Chen, Mark Hasegawa-Johnson and Nancy Chen

1284

Should code-switching models be asymmetric?

Barbara Bullock, Wally Guzman, Jacqueline Serigos and Almeida Jacqueline Toribio

1285

Visual timing information in audiovisual speech perception: evidence from lexical tone contour

Hui Xie, Biao Zeng and Rui Wang

1286

A Weighted Superposition of Functional Contours model for modelling contextual prominence of elementary prosodic contours

Branislav Gerazov, gerard bailly and Yi Xu

1288

An Interlocutor-Modulated Attentional LSTM for Differentiating between Subgroups of Autism Spectrum Disorder

Yun-Shao Lin, Susan Shur-Fen Gau and Chi-Chun Lee

1291

Multi-resolution gammachirp envelope distortion index for intelligibility prediction of noisy speech

Katsuhiko Yamamoto, Toshio Irino, Narumi Ohashi, Shoko Araki, Keisuke Kinoshita and Tomohiro Nakatani

1293

A Case Study on the Importance of Belief State Representation for Dialogue Policy Management

Margarita Kotti, Vassilios Diakoloukas, Alexandros Papangelis, Michail Lagoudakis and Yannis Stylianou

1294

SPEECH ENHANCEMENT USING THE MINIMUM-PROBABILITY-OF-ERROR CRITERION

Jishnu Sadasivan, Subhadip Mukherjee and Chandra Sekhar Seelamantula

1295

Learning Structured Dictionaries for Exemplar-based Voice Conversion

Shaojin Ding, Christopher Liberatore and Ricardo Gutierrez-Osuna

1296

Single-Channel Dereverberation Using Direct MMSE Optimization and Bidirectional LSTM Networks

Wolfgang Mack, Soumitro Chakrabarty, Fabian-Robert Stöter, Sebastian Braun, Bernd Edler and Emanuël Habets

1297

Exploration of Compressed ILPR Features for Replay Attack Detection

Sarfaraz Jelil, Sishir Kalita, S R Mahadeva Prasanna and Rohit Sinha

1298

Learning Conditional Acoustic Latent Representation with Gender and Age Attributes for Automatic Pain Level Recognition

Jeng-Lin Li, Yi-Ming Weng, Chip-Jin Ng and Chi-Chun Lee

1299

A Compact and Discriminative Feature based on Auditory Summary Statistics for Acoustic Scene Classification

Hongwei Song, Jiqing Han and Shiwen Deng

1301

Multi-channel Attention for End-to-End Speech Recognition

Stefan Braun, Daniel Neil, Jithendar Anumula, Enea Ceolini and Shih-Chii Liu

1302

BUT system for low resource Indian language ASR

Bhargav Pulugundla, Murali Karthick Baskar, Santosh Kesiraju, Ekaterina Egorova, Martin Karafiat, Lukas Burget and Jan Černocký

1305

Deep Metric Learning for the Target Cost in Unit-Selection Speech Synthesizer

Ruibo Fu, Jianhua Tao, Yibin Zheng and Zhengqi Wen

1306

Acoustic-dependent phonemic transcription for text-to-speech synthesis

Kévin Vythelingum, Yannick Estève and Olivier Rosec

1308

Unsupervised Word Segmentation from Speech with Attention

Pierre Godard, Marcely Zanon Boito, Lucas Ondel, Alexandre Berard, François Yvon, Aline Villavicencio and Laurent Besacier

1309

Liulishuo's System for the Spoken CALL Shared Task 2018

Huy Nguyen, Lei Chen, Ramon Prieto, Chuan Wang and Yang Liu

1310

Harmonic-Percussive Source Separation of Polyphonic Music by Suppressing Impulsive Noise Events

Gurunath Reddy M, K Sreenivasa Rao and Partha Pratim Das

1312

Impact of ASR Performance on Free Speaking Language Assessment

Kate Knill, Mark Gales, Konstantinos Kyriakopoulos, Andrey Malinin, Anton Ragni, Yu Wang and Andrew Caines

1313

A Comparison of Speaker-based and Utterance-based Data Selection for Text-to-Speech Synthesis

Kai-Zhan Lee, Erica Cooper and Julia Hirschberg

1316

Data requirements, selection and augmentation for DNN-based speech synthesis from crowdsourced data

Markus Toman, Geoffrey Meltzner and Rupal Patel

1318

Semi-supervised learning for information extraction from dialogue

Anjuli Kannan, Kai Chen, Alvin Rajkomar and Diana Jaunzeikare

1319

Anomaly Detection Approach for Pronunciation Verification of Disordered Speech using Speech Attribute Features

Mostafa Shahin, Beena Ahmed, Jim Ji and Kirrie Ballard

1320

Prosodic Focus Acquisition in French Early Cochlear Implanted Children

Chadi Farah, Stephane Roman and Mariapaola D'Imperio

1326

Low-Resource Speech-to-Text Translation

Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez and Sharon Goldwater

1327

Stochastic Shake-Shake Regularization for Affective Learning from Speech

Che-Wei Huang and Shrikanth Narayanan

1328

An Optimization Based Approach for Solving Spoken CALL Shared Task

Mohammad Ateeq, Abualsoud Hanani and Aziz Qaroush

1331

Vocalic, Lexical and Prosodic Cues for the INTERSPEECH 2018 Self-Assessed Affect Challenge

Claude Montacié and Marie-José Caraty

1333

Statistical Model Compression for Small-Footprint Natural Language Understanding

Grant Strimel, Kanthashree Mysore Sathyendra and Stanislav Peshterliev

1336

Automatically measuring L2 speech fluency without the need of ASR: a proof-of-concept study with Japanese learners of French

Lionel Fontan, Maxime Le Coz and Sylvain Detey

1339

A GPU-based WFST Decoder with Exact Lattice Generation

Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Dan Povey and Sanjeev Khudanpur

1342

Adding New Classes Without Access to the Original Training Data with Applications to Language Identification

Hagai Taitelbaum, Ehud Ben-Reuven and Jacob Goldberger

完整论文列表:

http://interspeech2018.org/accepted-papers.html

-END-

原文发布于微信公众号 - 专知(Quan_Zhuanzhi)

原文发表时间:2018-06-10

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏数据科学与人工智能

【陆勤践行】DataSchool 推荐的数据科学资源

Blogs Simply Statistics1: Written by the Biostatistics professors at Johns Hopki...

2859
来自专栏专知

【最新】人工智能领域顶会AAAI 2018 Pre-Proceedings 论文列表(附pdf下载链接)

【导读】人工智能领域顶尖学术会议 AAAI 2018,暨第32届 AAAI 大会将于 2 月 2 日 - 2 月 7 日 在新奥尔良举行。AAAI 是由人工智能...

1.3K6
来自专栏专知

ACL 2018 计算语言学协会接受论文列表

4111
来自专栏腾讯高校合作

【犀牛鸟·视野】SIGGRAPH Asia 2017 (DAY 3):领略前沿poster papers,关注WebXR新技术

今天是SIGGRAPH Asia 2017的第三天,也是Poster papers讲解的最后一天(总共两天,每天中午13:00-14:00)。今年中了poste...

4086
来自专栏专知

【OpenAI】深度强化学习关键论文列表

【导读】本文列出了值得一读的深度强化学习论文,分为无模型强化学习、探索、迁移和多任务强化学习、层次结构、记忆、基于模型的强化学习、元强化学习、现实生活中的强化学...

2411
来自专栏机器学习实践二三事

NLP常用数据集

原文地址: https://machinelearningmastery.com/datasets-natural-language-processing/ 针...

31010
来自专栏CreateAMind

强化学习族谱

https://github.com/tigerneil/deep-reinforcement-learning-family

1211
来自专栏HansBug's Lab

1342: [Baltic2007]Sound静音问题

1342: [Baltic2007]Sound静音问题 Time Limit: 5 Sec  Memory Limit: 162 MB Submit: 710 ...

3707
来自专栏CreateAMind

Building Agents with Imagination

1583
来自专栏生信技能树

乳腺癌预后基因集

In addition to cell of origin and somatic mutation events, studies over the past...

1363

扫码关注云+社区

领取腾讯云代金券