语音音色变换系统研究与实现

CORC > 声学研究所 > 中国科学院声学所 > 声学所博硕士学位论文 > 1981-2009博硕士学位论文

题名	语音音色变换系统研究与实现
作者	倪素萍
学位类别	博士
答辩日期	2005
授予单位	中国科学院声学研究所
授予地点	中国科学院声学研究所
关键词	语音音色变换线性变换 Mel倒谱
中文摘要	语音音色变换技术是语音合成技术的一个重要分支，音色变换的目的是将一个说话人（源说话人）的语音变换成像另一个说话人（目标说话人）的语音。本文首先实现了一个基线系统：采用基于GMM模型的线性变换方法对线谱频率参数（lsf）和基音频率参数（fo）进行变换；得到的参数输入LPc合成器获得变换后的语音。在基线系统的基础上，从提高合成质量和改善系统性能的角度，我们采用两种方法进行了改进：1．引入了混合激励线性预测（MELP）方法，改进基线系统激励源单一的不足：利用其混合激励来细化激励源；并利用自适应谱增强和脉冲扩散来进一步改善变换后的合成语音质量。改进后的系统增加了抖动因子（jitter）、傅立叶幅度参数（fsmag）和子带清浊度参数（bpvc）。细化激励源的参数的变换采用了码本查找的方法进行。改进后的系统的平均意见分（MOS分）比基线系统提高约14％。2．用美尔（Mel）倒谱参数（c）和基频（f0）作为变换特征，用基于GMM模型的线性变换方法进行变换。变换后的特征参数则采用了Mel倒谱波形逼近（MLSA）的合成方法。使用Mel倒谱变换与合成的方法所实现的系统，其MOS分相对基线系统有n％的提高。最后，实现了一个演示系统。
英文摘要	Voice conversion is an important branch of speech synthesis. It is a technology that modifies the utterance of one speaker (source speaker) so that it sounds as if it had been pronounced by another speaker (target speaker). In this paper, a baseline voice conversion system is implemented that transforms line spectral frequency parameter and pitch parameter (fO) by GMM-based linear transformation and synthesizes speech using a LPC synthesizer based on transformed parameters. Based on the baseline system, two methods are introduced to improve its synthesized quality and performance in this paper. A demonstration system of voice conversion is also realized. 1. Mixed Excitation Linear Prediction (MELP) method is introduced to compensate for the lack of quality for the simple source model of LPC synthesizer by using its refined mixed pulse and noise excitation source, and to improve the quality of synthesized speech by using its adaptive spectral enhancement and pulse dispersion. More parameters, such as jitter, Fourier magnitude parameters and bandpass voicing coefficients, are used to improve the conversion quality. These parameters are transformed by mapping codebook. Mean opinion score (MOS) of improved system is increased by 14% compared to that of the baseline system. 2. In order to evaluate the system performance, parameters of Mel cepstral coefficient and pitch frequency are used for comparative purpose. After linear transformation based on GMM, synthesis speech can be represented by using Mel log spectrum approximation (MLSA) filter with the transformed coefficients. Listening test shows that the MOS of this new system is increased by 11% compared to that of the baseline system.
语种	中文
公开日期	2011-05-07
页码	67
内容类型	学位论文
源URL	[http://159.226.59.140/handle/311008/990]
专题	声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式 GB/T 7714	倪素萍. 语音音色变换系统研究与实现[D]. 中国科学院声学研究所. 中国科学院声学研究所. 2005.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们