题名关于神经网络在连续语音识别中的应用研究
作者周健来
学位类别博士
答辩日期1999
授予单位中国科学院声学研究所
授予地点中国科学院声学研究所
关键词语音识别 神经网络 后处理器 两遍识别策略
中文摘要近二十年来,语音识别技术有了飞速的发展,但仍然存在很多有待解决的问题。目前。基于HMM的统计声学模型已经成为构造语音识别系统的主流技术,同时,因为神经网络具有良好的静态模式分类能力,已经在音素识别中获得了巨大的成功。但由于神经网络的固有缺陷,在连续语音识别中一直未能有效地利用神经网络的分类能力。本文的研究重点正是从这个基本点出发,以HMM为框架,结合神经网络来提高声学模型的性能。围绕这个目标,本文的研究范围不仅涉及HMM和神经网络的有关理论,而且对于它们在连续语音识别中的结合提出了新的方法。整个论文包含以下内容。第一章是综述。在综述中,作者回顾了语音识别的发展历史,并总结了目前语音识别技术的整体框架和某些部分的研究动态,对一些主要技术进行了必要的论述和分析。第二章讨论了HMM的基本理论,以及用于语音识别时,应该做的一些修改。在第二章中还讨论了三种不同形式的HMM,对于它们各自的优、缺点给出的结论。第三章中详细描述了我们的基本识别系统。在这个系统中,以VQ、DHMM和One-Pass搜索算法构成一个较为简单的声学识别器,这个基本识别系统是本文中进一步研究的实验平台和基础。第四章中首先深入地分析了神经网络在音素识别中的应用方法,指出了现有技术在训练速度、训练数据量等方面的局限性。针对这种局限性,作者提出了改进神经网络训练的快速算法,使神经网络可以在大数据量训练的条件下,迅速收敛,实验表明:新算法的收敛速度一般是传统BP算法的3到4倍。另外,第四章中还讨论了神经网络在连续语音识别中的应用方法,指出传统的基于HMM/ANN混合识别系统的缺陷。正是由于这种缺陷,使传统的HMM/ANN混合方法并不适合作为现代语音识别系统的声学模型。第五章在分析基本识别系统结果的基础上,结合本实验室在孤立词识别中的一些成果,提出了以神经网络作为后处理器的两遍识别策略,并设计了四组实验来验证这种方法的有效性。实验结果说明,这种方法显著地提高了我们的基本识别系统的识别率。最后,我们对本文的工作做了总结并对以后的工作提出了展望。
英文摘要For the past two decades, although research in speech recognition has made great progress, there are many unsolved problems in this field. At present, the statistical acoustic model based on hidden Markov model (HMM) has been the mainstream technique in building speech recognition systems. On the other hand, because of its powerful ability in classifying static patterns, the neural network (NN) has succeeded in phoneme recognition. But due to its inherent flaw, there has not been effective method in continuous speech recognition, which utilizes the ability of NN. In this dissertation, author aims at improving the performance of HMM based acoustic model, combining with neural network. Not only is the theory HMM and neural network involved, but also a new approach combining HMM with NN is presented to be applied in continuous speech recognition. The following contents are included in this dissertation. Chapter 1 is a summarization of the history of speech recognition. The author reviews the whole framework of speech recognition and some of the key advances in several areas. At the same, some main techniques are discussed briefly. In Chapter 2 the author introduces basic theory of HMM, and discusses some implementation issues about it. Furthermore, three kinds of HMM are compared to each other and the conclusion about their advantage and disadvantage is given. In Chapter 3, a baseline speech recognition system is described in detail. In this system, a simple acoustic recognizer is constructed based on VQ,DHMM and One-Pass search algorithm. The baseline system will be the basis of research work in this dissertation. The application of neural network in speech recognition are discussed. Secondly the author presents a new learning algorithm for NN to overcome traditional method because it is so slow that can not be used in the case that lot's of speech data is available. Lastly, the conventional hybrid speech recognition system based on HMM/NN is analyzed and the drawback of this kind of system is found. As a result, the conventional hybrid system is no longer fit to modern speech recognition system. In Chapter 5, the author puts forward a new strategy for the acoustic model. The neural network was used as a post processor, which classify the speech data segmental by HMM recognizer. Major issues such as how the use the segmentation information of HMM in neural network, the structure of the neural network, the choice of the error metric for training neural net, and the determination of the training procedure are investigated within a set of experiments. In these experiments, we attempt to recognize 68 phoneme like units in continuous speech. Our results indicate that this is a potential method: about 20% can be obtained to improve the recognition accuracy for our baseline speaker independent speech recognition system. At last, whole work in this dissertation is summed up and the prospect of research in the near future is proposed.
语种中文
公开日期2011-05-07
页码99
内容类型学位论文
源URL[http://159.226.59.140/handle/311008/612]  
专题声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式
GB/T 7714
周健来. 关于神经网络在连续语音识别中的应用研究[D]. 中国科学院声学研究所. 中国科学院声学研究所. 1999.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace