多模式语音端点检测

CORC > 清华大学

	多模式语音端点检测
	刘鹏 ; 王作英 ; LIU Peng ; WANG Zuoying
	2010-06-09 ; 2010-06-09
关键词	语音识别语音端点检测多模式 speech recognition voice activity detection multimodal TN912.4
其他题名	Multimodal voice activity detection
中文摘要	在语音信号处理系统中,基于帧能量的语音端点检测(voiceactivitydetection,VAD)往往受到语音段能量不平稳及噪声的影响,为了提高语音端点检测的性能和鲁棒性,引入视觉信息。该文提出采用基于数据驱动的线性变换生成视觉特征,在提出一个基于统计的VAD一般模型的基础上构建两个单模式的VAD系统,通过两步式的融合方法,得到了多模式的VAD系统。实验表明:同时利用音频和视觉信息的多模式VAD比基于帧能量的听觉VAD在帧错误率上有55.0%的相对下降,在断句错误率上有98.5%的相对下降。这一结果说明多模式VAD方法基本可以避免断句错误,也能够显著改善帧检测性能,是一种相当有效的方法。; In speech recognition systems, the frame energy-based voice activity detection (VAD) method may be affected by interferance from background noise and non-stationary characteristics of the frame energy in the voice segment. This paper presents a model to improve the performance and robustness of VAD by introducing visual information. Data driven linear transformations are used for visual feature extraction with a general statistical VAD model and a two-stage fusion strategy in a multimodal VAD system. Experiments show a 55.0% reduction in the frame error rate and a 98.5% reduction in sentence breaking error rate with the multimodal VAD as compared to the frame energy-based audio VAD. The results show that multimodal method eliminates most sentence breaking errors, and improves frame detection performance.; 国家“八六三”高技术项目(2001AA114071)
语种	中文 ; 中文
内容类型	期刊论文
源URL	[http://hdl.handle.net/123456789/54812]
专题	清华大学
推荐引用方式 GB/T 7714	刘鹏,王作英,LIU Peng,等. 多模式语音端点检测[J],2010, 2010.
APA	刘鹏,王作英,LIU Peng,&WANG Zuoying.(2010).多模式语音端点检测..
MLA	刘鹏,et al."多模式语音端点检测".(2010).

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们