题名基于哼唱旋律的歌曲检索
作者吴晓
学位类别博士
答辩日期2009-05-23
授予单位中国科学院声学研究所
授予地点声学研究所
关键词音乐检索 哼唱检索 旋律匹配 旋律搜索
其他题名Song Melody Retrieval Based on Human Humming/Singing
学位专业信号与信息处理
中文摘要哼唱检索是一种通过人们哼唱的旋律来搜索目标歌曲的音乐技术。不同于Google这样的传统搜索引擎,哼唱检索并不依赖于歌名、歌手、歌词这样的文字信息,而是直接基于旋律的内容进行搜索。这种自然的检索方式不但在文字信息缺失的情况下能够有效找到目标歌曲,而且在一些文字输入不方便的应用场景下也相当有实用价值。 本论文主要研究哼唱检索中的相关技术。论文基于一个实际的哼唱检索系统,针对哼唱检索在旋律表示、旋律匹配对齐、候选高效检索以及歌曲库的组织等各个环节中存在的问题,提出一系列新的解决方法,并进行了实验验证。本文论述的哼唱系统多次以较大优势在MIREX国际哼唱检索评测中取得第一名,也从一个侧面证明了论文提出技术的有效性。 本论文的主要研究工作与创新点如下: 1. 提出综合了声学层、符号层、乐句层三层旋律信息的哼唱检索框架,每个层次的旋律表示都在检索系统中承担不同的责任,从而能够充分发挥各层旋律表示的优点。 2. 提出了自上而下的全新旋律匹配策略,并基于该策略提出递归对齐算法RA。不同于传统动态规划方法的是,RA算法先在大尺度上匹配旋律轮廓,再在小尺度上匹配局部细节,有效的强调了长时的韵律结构并淡化了局部失配对整体的影响,从而加强算法对干扰旋律的鉴别能力以及对各类错误的容忍能力。此外,论文还提出一种高效的对齐边界调整算法LBO, 能够进一步优化RA的对齐边界。 3. 提出了声学层的模糊轮廓因子投票算法以及符号层的层叠过滤策略,用以高效筛选搜索空间,增加系统效率。 4. 提出了基于鉴别语言模型的MIDI主旋律音轨自动标注算法,能够有效降低数据库处理过程中的人工干预。此外,论文还尝试将上述方法应用到音乐类型分类和作曲家分类中,取得了良好的效果。
英文摘要Query-by-humming (QBH) is the search technique that allows users to retrieve music via a few seconds of humming/singing. Compared with traditional text search engines, the QBH system is based on melodic similarity instead of text meta-data, and is effective in the situation that text information is unavailable or text input is inconvenient. Query-by-humming is far from a solved problem. Problems of melody representation, melodic similarity measurement, efficient melody searching and automatic database construction still challenge the performance of QBH systems. This thesis aims to handle above challenges and propose several strategies and algorithms for high performance humming melody retrieval. Furthermore, based on proposed methods, a QBH system is developed and shows excellent retrieval performance and time/space efficiency. This system has also been submitted to 2006 to 2008 MIREX QBH evaluations, and performs best in most of tasks. The thesis contains the following works and contributions: 1. A novel framework based on three level melody representation is proposed. Such a framework adopts different melody representations in different searching stages, which benefits from the accuracy of acoustic pitch series representation, the efficiency of symbolic note sequence representation and the stability of melodic contour representation. 2. A heuristic algorithm called recursive alignment (RA) is proposed to match melodies and measure their similarity at acoustic level. Compared with traditional dynamic programming approaches, RA performs global contour matching ahead of local detail tuning, and thus favors those melody candidates with well-matched rhythm structure rather than ones with well-matched local notes. Such a top-down strategy helps RA balance well between ability of discrimination and ability of error tolerance. Furthermore, with the intention to further improve RA's local alignment, a linear algorithm called local boundary optimization (LBO) is also presented. 3. Sentence level fuzzy contour trigram voting (FCTV) algorithm and symbolic level cascade filtering strategy are proposed to efficiently reduce the candidate pool. Results show that these strategies can significantly accelerate the searching process without evident retrieval accuracy decrease. 4. A discriminative language modeling approach is proposed for melody track selection, which can release the human work as well as increase the level of automation in melody database generation. Besides, such an algorithm is also applied to melody genre classification and composer classification and in both tasks gives promotive results.
语种中文
公开日期2011-05-07
页码125
内容类型学位论文
源URL[http://159.226.59.140/handle/311008/512]  
专题声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式
GB/T 7714
吴晓. 基于哼唱旋律的歌曲检索[D]. 声学研究所. 中国科学院声学研究所. 2009.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace