题名语音识别在新疆“双语”教学软件中的应用
作者李凯
学位类别硕士
答辩日期2009-06
授予单位中国科学院研究生院
授予地点北京
导师蒋同海
关键词语音识别 隐马尔可夫模型 梅尔倒谱系数 高斯混合模型 自适应调整
学位专业计算机应用技术
中文摘要近些年来,随着新疆经济的快速发展,新疆与内地的交往和联系日趋广泛和深入。但是对于新疆的民族群众而言,语言障碍阻碍了这种交往和联系。最有效的解决方法,就是提高少数民族教师和学生的汉语水平。而改善目前语言学习中的发音问题,是提高新疆“双语”教学的重要一环。语音识别技术能够识别出学习者的汉语发音,同时计算出该汉语发音的准确度,从而帮助学习者掌握汉语的准确发音;声调识别技术能够帮助学习者发音的声调,这对于那些少数民族学生来说是非常有帮助的。首先本文从理论上阐述了语音识别中的重要思想和方法。然后利用隐马尔可夫模型(HMM)、高斯混合模型和上下文相关的汉语三音子建模方法,基于HTK平台的语音识别的框架结构,建立了基于863语音库的说话人识别系统。并考察了单音素和三音素对识别系统的影响,发现三音素模型使得系统的识别率有了明显的提升。并比较不同的高斯混合数对识别系统的影响,发现了随着高斯混合数的增加识别率有明显的提升,但是当达到一定的程度后,识别率的增长开始变得缓慢,并且随着高斯混合数的增加,所需要的模型训练时间越来越长。通过实验我们找到了高斯混合数和耗费训练时间的平衡点。建立了特有的新疆少数民族说汉语的语音语料库,并对部分数据进行了标注。由于在863语音库的说话人识别系统上,我们对新疆少数民族说汉语的测试效果不是很理想。所以我们引入自适应调整技术,利用我们新疆少数民族群众说汉语的语音库中标注好的数据,对基于863语音库的说话人识别系统进行了声学模型上的调整,通过调整我们得到了系统性能上的明显提升。
英文摘要With Xinjiang economy fast development, the communication between Xinjiang and inland becomes more and more frequent in a wide range. But for the local minority people, language blocks the communication. To solve the problem, the best method improves spoken Chinese level of the minority teachers and students. To improve pronunciation of the problem, it is the most important aspect in language learning. The speech recognition technique helps the learners have an accurate mandarin pronunciation and calculate the veracity, improving learners’ pronunciation. The tone recognition checks their pronunciation tones to make them know whether their tones are correct or wrong as mandarin tone is the most difficult issue confusing minority students. First in this paper, we elaborate the important ideas and methods of speech recognition from theory. Then we make use of Hidden Markov Model (HMM), Gaussian mixture model and context-dependent three phonemes model to propose frame structure which is based on HTK and build speaker recognition system which is based on 863 speech corpus. And we compare the effect on the system of single phoneme to that of three phonemes, and the result shows a remarkable enhancement of recognition rate. We tests the effect of different Gaussian mixture numbers on the recognition system and finds that recognition rate will be improved if we add Gaussian mixture numbers. But to some degree, the growing speed of recognition rate will get slow. The more Gaussian mixture numbers we added, the longer the time to train the model will be. We finally find a balanced point about time and Gaussian mixture numbers. We build a specific speech corpus is based on spoken Chinese of Uygur minority people. Some data has been labeled. The test result of spoken Chinese of Uygur minority people based on 863 speech corpus speaker recognition system, is not so good. Because of that we introduce a technology of adjustment in our system to make a better performance. We used labeled data from Uygur speech corpus to adjust the speaker recognition system mode which is based on 863 speech corpus. And find that the recognition rate raises significantly.
公开日期2014-10-14
内容类型学位论文
源URL[http://ir.xjipc.cas.cn/handle/365002/3581]  
专题新疆理化技术研究所_多语种信息技术研究室
作者单位中国科学院新疆理化技术研究所
推荐引用方式
GB/T 7714
李凯. 语音识别在新疆“双语”教学软件中的应用[D]. 北京. 中国科学院研究生院. 2009.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace