Classification of Music and Speech in Mandarin News Broadcasts
Chuan Liu ; Lei Xie ; Helen Meng
2007
会议名称NCMMSC2007
英文摘要Audio scene analysis refers to the problem of classifying segments in a continuous audio stream according to content, e.g. speech versus non-speech, music, ambient noise, etc. Techniques that support such automatic segmentation is indispensable for multimedia information processing. For example, it is a precursor to processes such as indexing of speech segments by automatic speech recognition, automatic story segmentation based on recognition transcripts, speaker diarization, etc. This paper describes our work in the development of a speech/music discriminator for Mandarin broadcast news audio. We formed a high-dimensional feature vector that includes LPCC, LPS and STFT coefficients totaling 94 in all. We also experimented with three classifiers – the KNN, SVM and MLP. Experiments based on the Voice of America Mandarin news broadcasts show high classification performance with F-measure=0.98. The SVM also strikes the best balance in terms of classification performance and computation time (real-time) among the three classifiers.
收录类别其他
语种英语
内容类型会议论文
源URL[http://ir.siat.ac.cn:8080/handle/172644/2030]  
专题深圳先进技术研究院_集成所
推荐引用方式
GB/T 7714
Chuan Liu,Lei Xie,Helen Meng. Classification of Music and Speech in Mandarin News Broadcasts[C]. 见:NCMMSC2007.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace