Realistic Visual Speech Synthesis Based on Hybrid Concatenation Method

CORC > 自动化研究所 > 中国科学院自动化研究所 > 模式识别国家重点实验室 > 人机语音交互团队

	Realistic Visual Speech Synthesis Based on Hybrid Concatenation Method
	Tao, Jianhua; Xin, Le; Yin, Panrong
刊名	IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
	2009-03-01
卷号	17 期号:3 页码:469-477
关键词	Fused hidden Markov model (HMM) inversion speech-driven facial animation unit concatenation visual speech synthesis
英文摘要	This paper presents a realistic visual speech synthesis based on the hybrid concatenation method. Unlike previous methods based on phoneme level unit selection or hidden Markov model (HMM), etc., the hybrid concatenation method uses a frame level-based unit selection method combined with a fused HMM, and is able to generate more expressive and stable facial animations. The fused HMM can be used to explicitly model the loose synchronization of tightly coupled streams, with much better results than a normal HMM for audiovisual mapping. After fused HMM is created, facial animation is generated via the unit selection method at the frame level by using the fused HMM output probabilities. To accelerate the computing efficiency of the unit selection on a large corpus, this paper also proposes a two-layer. Viterbi search method in which only the subsets that have been selected in the first layer are further checked in the second layer. Using this idea, the system has been successfully integrated into real-time applications. Furthermore, the paper also proposes a mapping method to generate emotional facial expressions from neutral facial expressions based on Gaussian mixture models (GMMs). Final experiments prove that the method described can output synthesized facial parameters with high quality. Compared with other audiovisual mapping methods, our method has better performance with respect to expressiveness, stability, and system running speed.
WOS标题词	Science & Technology ; Technology
类目[WOS]	Acoustics ; Engineering, Electrical & Electronic
研究领域[WOS]	Acoustics ; Engineering
关键词[WOS]	HIDDEN MARKOV-MODELS ; ANIMATION ; CONVERSION ; FACE
收录类别	SCI
语种	英语
WOS记录号	WOS:000263639400007
内容类型	期刊论文
源URL	[http://ir.ia.ac.cn/handle/173211/3236]
专题	自动化研究所_模式识别国家重点实验室_人机语音交互团队
作者单位	Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100080, Peoples R China
推荐引用方式 GB/T 7714	Tao, Jianhua,Xin, Le,Yin, Panrong. Realistic Visual Speech Synthesis Based on Hybrid Concatenation Method[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,2009,17(3):469-477.
APA	Tao, Jianhua,Xin, Le,&Yin, Panrong.(2009).Realistic Visual Speech Synthesis Based on Hybrid Concatenation Method.IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,17(3),469-477.
MLA	Tao, Jianhua,et al."Realistic Visual Speech Synthesis Based on Hybrid Concatenation Method".IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 17.3(2009):469-477.