CORC  > 北京大学  > 信息科学技术学院
Modeling speaker variability using long short-term memory networks for speech recognition
Li, Xiangang ; Wu, Xihong
2015
英文摘要Speaker adaptation of deep neural networks (DNNs) based acoustic models is still a challenging area of research. Con- sidering that long short-term memory (LSTM) recurrent neural networks (RNNs) have been successfully applied to many se- quence prediction and sequence labeling tasks, we propose to use LSTM RNNs for modeling speaker variability in automatic speech recognition (ASR). Firstly, the LSTM RNNs are used for extracting d-vectors (deep vector), which are then concate- nated with the raw features for acoustic models. The speaker information provided by d-vectors helps DNNs based acoustic models figure out the speaker normalization during training. Furthermore, motivated by the idea that speech message can also be useful for speaker recognition, a new network called as cross-LSTM is proposed, which consist of two LSTMs: one for classifying speakers and the other for classifying senones. As a result, the speaker recognition and speech recognition are conducted simultaneously. Experiments are conducted on a conversational telephone speech corpus. Experimental results show the proposed models are effective for alleviating speaker variability in ASR, and yield 6% relative improvement for the LSTMP RNNs based systems. Copyright ? 2015 ISCA.; EI; 1086-1090; 2015-January
语种英语
出处16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015
内容类型其他
源URL[http://ir.pku.edu.cn/handle/20.500.11897/436888]  
专题信息科学技术学院
推荐引用方式
GB/T 7714
Li, Xiangang,Wu, Xihong. Modeling speaker variability using long short-term memory networks for speech recognition. 2015-01-01.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace