题名 | 连续音频流环境下的说话人检测技术 |
作者 | 白俊梅 |
学位类别 | 工学博士 |
答辩日期 | 2006-06-08 |
授予单位 | 中国科学院研究生院 |
授予地点 | 中国科学院自动化研究所 |
导师 | 徐波 ; 张树武 |
关键词 | 说话人检测 二级音频切分 GMM F0相关图 说话人分类 speaker detection two-pass audio segmentation GMM pitch correlogram speaker classfication |
其他题名 | Speaker detection in continuous audio streams |
学位专业 | 模式识别与智能系统 |
中文摘要 | 说话人检测是基于生物特征识别技术、从音频信息中搜索和定位目标人出现的次数和时间位置的一种模式识别技术,是语音识别应用领域的一大研究课题。本文基于电视广播音频检索和电话对话犯罪嫌疑人追踪等应用背景,针对连续音频流下说话人检测中的关键问题及技术难点,在以下几个方面进行了研究: 在音频切分方法上,提出了基于熵变化趋势和KL2/VQ聚类的二级音频切分方法。首先,通过音频信号的熵变化曲线,对音频流进行预切分,确定音频流中的潜在变化点。针对基于熵的音频切分的误检率,我们分别采用KL2距离和VQ聚类对预切分结果进行重估,去除潜在变化点中的大量“伪变化点”,进一步改善音频切分的性能。 针对大规模说话人注册库的应用环境,提出了基于F0相关图的说话人分类方法,尽管该方法在分类准确率和处理速度上优于传统的基于模型距离的说话人分类方法,但是仍然不能满足面向大规模说话人群的说话人检测/识别的实时处理要求。基于此,我们通过压缩计算参数的快速匹配算法有效的提高了说话人检测的处理速度。在面向大规模人群的说话人检测中,通过快速匹配算法来提高系统的处理速度,具有潜在的应用前景。 比较了当前流行的说话人识别模型的性能。结合F0特征与MFCC特征,构建了基于F0的分组GMM-UBM说话人识别器,并作了相应的参数优化。实验结果表明,基于F0的分组GMM-UBM说话人识别器具有良好的鲁棒性。 构建了说话人检测系统。分析了实际环境下的噪声消除和补偿技术,从实验的角度验证了各种噪声补偿方法的性能,并在广播音频流和电话对话语音下验证了该说话人检测系统的性能,给出了说话人检测的总体实验结果。 |
英文摘要 | Speaker detection is an important issue in speech recogition field. Its aim is to find objective speakers in continuous audio streams. The work of speaker detection mainly consists of three steps: audio segmentation, speaker classification and speaker recognition. Aimed at several key problems of speaker detection in continuous audio streams, we present many researchs on the issue. This thesis makes three main contributions: First, two-pass audio segmentation method is approved. The first pass is based on the entropy curvers of audio signals to detect some potential change points. Although the performance of the entropy-based method is better than the tranditional audio segementation based on BIC, there are still many false alarm points in the results. So, we apply KL2 and VQ clustering to refine these potential change points. The performance is improved greatly based on the two-pass segmentation strategy. Second, for speeding up the processing in large speakers recognition, traditional methods use speakers clustering at first. However, the falses in speaker clustering will affect the performance of speaker recognition. We apply feature compression to speed up the calculation process. The method shows good performance in large speakers recognition and it is promising in future research. Last, a grouped speaker recognition system that inegrates a posteriori probability of observing a MFCC vector and the pitch frequency (F0) has been reported. The recognizer, that preserves the dependence between the vocal source and the vocal tract, is robust in noise environments. What’s more, many compensations for noises and channels are used to optimize the speaker detection system. The experiments both in broadcast streams and telephone dialog demonstrate the good performance of the whole speaker detection system. |
语种 | 中文 |
其他标识符 | 200218014603191 |
内容类型 | 学位论文 |
源URL | [http://ir.ia.ac.cn/handle/173211/5945] ![]() |
专题 | 毕业生_博士学位论文 |
推荐引用方式 GB/T 7714 | 白俊梅. 连续音频流环境下的说话人检测技术[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2006. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论