Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech

CORC > 自动化研究所 > 中国科学院自动化研究所 > 模式识别国家重点实验室 > 智能交互

	Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech
	Li, Ya; Tao, Jianhua; Hirose, Keikichi; Xu, Xiaoying; Lai, Wei
刊名	SPEECH COMMUNICATION
	2015-09-01
卷号	72 页码:59-73
关键词	Prosody Stress Hierarchical Modeling Fujisaki Model Speech Synthesis
文献子类	Article
英文摘要	Expressive speech synthesis has received increased attention in recent times. Stress (or pitch accent) is the perceptual prominence within words or utterances, which contributes to the expressivity of speech. This paper summarizes our contribution to Mandarin expressive speech synthesis. A novel hierarchical stress modeling and generation method for Mandarin is proposed and further integrated into HMM-based speech synthesis (HTS) and Fujisaki model-based speech synthesis systems to accurately model the undulation of pitch contour. In HMM-based expressive speech synthesis, stress-related contextual features obtained from the hierarchical model are introduced in modeling the prosodic variation caused by stress, in addition to the traditional prosodic features used in HTS. A rule-based and a Deep Belief Network based prosodic variation models are proposed and then used in stress adaptation module in HTS. The other approach uses the Fujisaki model to improve the expressiveness of synthetic speech. The hierarchical stress model is introduced into the phrase and tone command control mechanisms of the model. The pitch contour is then directly generated by the superposition of two-level commands of the Fujisaki model. Experimental results using the proposed hierarchical stress modeling and generation methods showed that the macro- and microcharacteristics of stress could be successfully captured. The methodology proposed in this paper has application to a range of areas such as conveying attitude and indicating focus in spoken dialog systems. (C) 2015 Elsevier B.V. All rights reserved.
WOS关键词	SPEAKER ADAPTATION ; EMOTIONAL SPEECH ; CONTEXT ; ALGORITHM ; CONTOURS
WOS研究方向	Acoustics ; Computer Science
语种	英语
WOS记录号	WOS:000359169000005
内容类型	期刊论文
源URL	[http://ir.ia.ac.cn/handle/173211/40846]
专题	模式识别国家重点实验室_智能交互
推荐引用方式 GB/T 7714	Li, Ya,Tao, Jianhua,Hirose, Keikichi,et al. Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech[J]. SPEECH COMMUNICATION,2015,72:59-73.
APA	Li, Ya,Tao, Jianhua,Hirose, Keikichi,Xu, Xiaoying,&Lai, Wei.(2015).Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech.SPEECH COMMUNICATION,72,59-73.
MLA	Li, Ya,et al."Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech".SPEECH COMMUNICATION 72(2015):59-73.