题名小资源中文语音合成研究
作者易立夫
学位类别博士
答辩日期2003
授予单位中国科学院声学研究所
授予地点中国科学院声学研究所
关键词语音合成 分层语料设计 韵律模型 HNM合成算法 语音转换 GMM模型
其他题名Chinese Speech Synthesis with Limited Resources
中文摘要语音合成是语音信号处理领域的一个重要分支,它的研究对人机交互、语音分析,语音编码,语音增强和语音识别等各个领域都有重要的促进作用。目前,基于大语料库的中文语音合成发展到了一个较高的水平,但对语音合成的小资源系统研究还不够深入。针对有限资源的中文语音合成系统,本文提出了一种全新的分层语料设计的方法。这种方法综合了数据驱动和规则驱动的语料设计二者的优点,在有限资源的条件下,实现了高效率的合成语料设计。该方法可以在语音数据库相对较小的条件下,保证一定韵律覆盖面,且使合成语音达到较高的自然度。在对较大规模的语料进行统计和分析的基础上,本文中采用C45决策树作为音高的预测模型,很好的完成了对分层音库样本选择,较大的提高了原有基于规则的样本挑选的语音合成系统的自然度。本文还讨论了HNM算法在中文合成系统中的应用。初步完成了该算法在中文语音合成中的实验和测试,扩展了韵律的控制范围。HNM算法在韵律控制和频谱调节方面都强于以往使用的TD-PSOLA算法。在讨论和回顾了其他一些语音转换的方法之后,本文介绍了一种基于高斯混合模型的语音转换方法。根据本文的初步实验结果,此方法能够完成特定说话人之间的语音转换,音质较好。此方法可以与音节为单元作波形拼接的中文语音合成系统相结合,在语音合成中取得较为广泛的实际应用。
英文摘要Speech synthesis is an important branch of speech signal processing. Its progress can greatly promote the research and applications of human-computer interaction, speech analysis, speech coding, speech enhancement, speech recognition and so on. At present, the speech synthesis based on large coipus has developed to a relatively high level, but not so high on the systems with limited resources. A layered corpus for Chinese has been brought forward in this dissertation for the speech synthesis with limited resources. The method integrates the advantages of both the data-driven corpus design and the rule-based corpus and realizes an efficient corpus design under the limited resources conditions. This method can cover a given prosodic range and realize a comparably high naturalness of the synthesized speech for a small TTS system. Based upon the statistics and analysis of the large corpus, this dissertation adopt C45 decision tree as the prosody model for the preferable selection of samples of the layered corpus, which has obviously improved the naturalness of original embedded TTS systems. It is also discussed the application of HNM algorithm in Chinese speech synthesis systems. The primary tests of this method for a Chinese speech synthesis system have distinctly extended the range of prosody. Compared with TD-PSOLA, HNM is better for prosody control and spectrum modification. A voice conversion method based on Gauss Mixture Model is presented after an investigation on relative progress. Primary tests have shown the appropriateness of the method for voice conversion among different individuals with high level naturalness. The method can be combined with the Chinese speech synthesis systems of waveform concatenating based on syllables, which will be widely applied in practical speech synthesis.
语种中文
公开日期2011-05-07
页码76
内容类型学位论文
源URL[http://159.226.59.140/handle/311008/1066]  
专题声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式
GB/T 7714
易立夫. 小资源中文语音合成研究[D]. 中国科学院声学研究所. 中国科学院声学研究所. 2003.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace