CORC  > 自动化研究所  > 中国科学院自动化研究所  > 毕业生  > 博士学位论文
题名高鲁棒性语音识别新方法研究
作者陈景东
学位类别工学博士
答辩日期1998-06-01
授予单位中国科学院自动化研究所
授予地点中国科学院自动化研究所
导师黄泰翼 ; 马颂德
其他题名New Approaches for Robust Speech Recognition
学位专业模式识别与智能系统
中文摘要目前的很多语音识别系统在训练与测试条件一致的情况下已经可以达到很 高的识别率,但具体到实际应用中,由于说话人的变化、环境条件的变化以及 噪声等因素的影响,会使识别系统的性能急剧下降。因此,如何提高语音识别 系统抗环境条件及说话人等因素变化的能力成了目前人们普遍关注的语音识别 的鲁棒性问题,同时,这一问题也被人们公认为是语音识别中的瓶颈问题。本文 围绕提高语音识别的鲁棒性,促使语音识别技术向实用化方向发展这一目标,从 两个方面展开了深入的研究:一是探索实用的自适应算法,以提高倒谱+HMM 语音识别系统的鲁棒性;二是探索能够精确表示语音信号特性的高鲁棒性语音 新特征,从根本上解决语音识别的鲁棒性。在此基础上,提出了多种有针对性的 解决方案。 在自适应方法研究方面,本文首先根据干扰源的不同,对现有的各种自适 应方法进行了分类总结,然后对一些应用前景较好的自适应方法进行了理论分 析和实验研究,并针对现有方法的不足,提出了相应的改进方案。这些方法包 括倒谱系数加权、倒谱系数后置滤波、倒谱归一化、RASTA滤波、 J—RASTA 滤波、谱减、VQ码本自适应、HMM参数自适应以及基于正则相关分析的谱 变换补偿(CCBC)方法等。对现有算法的改进,主要有以下几个方面:对倒谱系 数加权方法,我们对Rabiner提出的倒谱系数加权方法的三种权函数进行了修 正,并提出了一种二项式权函数;对于倒谱归一化技术,本文对大多数文献应 用的长时平均法进行了改进,提出了迭代法及短时平均法;对于CCBC方法, 最早由我们提出来主要用于噪声补偿,本文则将该方法改进成了一种通用自适 应技术,可以补偿由噪声、通道和说话人三者同时不匹配造成的识别性能的降 低。 在高鲁棒性语音新特征研究方面,本文主要从高阶谱、与尺度无关的转换 域、以及时频分析等三个方面进行了如下研究: 为了在语音识别中能够利用语音信号的高阶统计特性,从而提高系统的性 能及鲁棒性,本文研究了语音信号的双谱,提出了一种基于双谱的BMFC(Bi— mel-scale frequency cepstrum)语音新特征,并结合我们承担的电话语音识别任务 进行了实验验证。实验结果表明,BMFC特征可以有效地提高系统的性能,并 且其对白噪声的鲁棒性要比MFCC好得多。 为了减小由于说话人之间声道长度的差异而引起的识别系统性能的下降, 本文提出了一种基于Mellin变换的语音新特征(简称MMTLS)及其数字实现方 法,由于Mellin变
英文摘要Most current speech recognizers in controlled situation have reached very high levels of recognition accuracy.However,their performances degrade significantly when mismatch occurs between training and operating environments.The mismatch is mainly caused by three major distortion sources,i.e.,changes of noise level,changes of input channels and differences among speakers. For the Minimization of these effects caused by three distortion sources mentioned above, various kinds of techniques related to robust speech recognition have been investigated in this thesis.The techniques can briefly summarized as follows。 ●To improve the robustness of the CEPSTRUM+HMM based speech recognizer.many kinds of useful adaptation techniques include CMN,RASTA,cepstral coefficients postfiltering,cepstral coefficients weighting,spectral subtraction,adaptation via VQ prototype modification,adaptation via HMM parameters modification and canonical correlation based compensation method(short for CCBC)are investigated and modified.Particularly,the CCBC method,previously proposed for noise compensation, has been modified to remove the undesirable effects caused by noise,changes of channel and differences of speakers simultaneously。 ●Techniques which utilize a signal's Higher Order Statistics(HOS)can reveal information about non-Gaussian signals and nonlinearities which can not be obtained using conventional techniques.This information may be useful for speech recognition because it may provide clues about how to construct new features which are more robust to noise than the currently used cepstrum.This dissertation describes an investigation into the use of HOS techniques in speech recognition,in particular,a new kind of feature based on the bispectrum of speech signal,called bi-mel-scale cepstrum (short for BMFC),is proposed.Preliminary experiments on a telephone speech recognizer show that the BMFC can improve the performance effectively and it is more robust to white noise than the widely used MFCC do. ·One major source of interspeaker variability in speaker-independent speech recognition is the variation of the vocal tract shape.especially the vocal tract length(VTL)among individually speakers.If assume a uniform tube with length L for the model of the vocal tract.then the formant frequencies of mterances of a given sound are proportional to 1/L.Since the VTL can vary from appropriately 1 3cm for females to over 1 8cm for males,formant center frequencies can vary by as much as 25%among speakers.This source of variability results in state-of- the-art speaker-independent speech recognizers working poorly for outlier speakers whose vocal tract lengths differ significantly from those of speakers in the training set.In an effort to reduce the degradation in speech recognition performance caused by variation in the VTL among speakers,A new feat
语种中文
其他标识符446
内容类型学位论文
源URL[http://ir.ia.ac.cn/handle/173211/5682]  
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
陈景东. 高鲁棒性语音识别新方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所. 1998.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace