题名语音音色变换系统研究与实现
作者倪素萍
学位类别博士
答辩日期2005
授予单位中国科学院声学研究所
授予地点中国科学院声学研究所
关键词语音音色变换 线性变换 Mel倒谱
中文摘要语音音色变换技术是语音合成技术的一个重要分支,音色变换的目的是将一个说话人(源说话人)的语音变换成像另一个说话人(目标说话人)的语音。本文首先实现了一个基线系统:采用基于GMM模型的线性变换方法对线谱频率参数(lsf)和基音频率参数(fo)进行变换;得到的参数输入LPc合成器获得变换后的语音。在基线系统的基础上,从提高合成质量和改善系统性能的角度,我们采用两种方法进行了改进:1.引入了混合激励线性预测(MELP)方法,改进基线系统激励源单一的不足:利用其混合激励来细化激励源;并利用自适应谱增强和脉冲扩散来进一步改善变换后的合成语音质量。改进后的系统增加了抖动因子(jitter)、傅立叶幅度参数(fsmag)和子带清浊度参数(bpvc)。细化激励源的参数的变换采用了码本查找的方法进行。改进后的系统的平均意见分(MOS分)比基线系统提高约14%。2.用美尔(Mel)倒谱参数(c)和基频(f0)作为变换特征,用基于GMM模型的线性变换方法进行变换。变换后的特征参数则采用了Mel倒谱波形逼近(MLSA)的合成方法。使用Mel倒谱变换与合成的方法所实现的系统,其MOS分相对基线系统有n%的提高。最后,实现了一个演示系统。
英文摘要Voice conversion is an important branch of speech synthesis. It is a technology that modifies the utterance of one speaker (source speaker) so that it sounds as if it had been pronounced by another speaker (target speaker). In this paper, a baseline voice conversion system is implemented that transforms line spectral frequency parameter and pitch parameter (fO) by GMM-based linear transformation and synthesizes speech using a LPC synthesizer based on transformed parameters. Based on the baseline system, two methods are introduced to improve its synthesized quality and performance in this paper. A demonstration system of voice conversion is also realized. 1. Mixed Excitation Linear Prediction (MELP) method is introduced to compensate for the lack of quality for the simple source model of LPC synthesizer by using its refined mixed pulse and noise excitation source, and to improve the quality of synthesized speech by using its adaptive spectral enhancement and pulse dispersion. More parameters, such as jitter, Fourier magnitude parameters and bandpass voicing coefficients, are used to improve the conversion quality. These parameters are transformed by mapping codebook. Mean opinion score (MOS) of improved system is increased by 14% compared to that of the baseline system. 2. In order to evaluate the system performance, parameters of Mel cepstral coefficient and pitch frequency are used for comparative purpose. After linear transformation based on GMM, synthesis speech can be represented by using Mel log spectrum approximation (MLSA) filter with the transformed coefficients. Listening test shows that the MOS of this new system is increased by 11% compared to that of the baseline system.
语种中文
公开日期2011-05-07
页码67
内容类型学位论文
源URL[http://159.226.59.140/handle/311008/990]  
专题声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式
GB/T 7714
倪素萍. 语音音色变换系统研究与实现[D]. 中国科学院声学研究所. 中国科学院声学研究所. 2005.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace