CORC  > 自动化研究所  > 中国科学院自动化研究所  > 毕业生  > 博士学位论文
题名通用视觉对象分类方法与系统
作者程刚
学位类别工学博士
答辩日期2010-05-29
授予单位中国科学院研究生院
授予地点中国科学院自动化研究所
导师王春恒
关键词视觉对象分类 多特征融合 二值化特征变换 场景图像分类 判别性词典学习 visual object categorization multi-feature fusion adaptive binarized data transformation scene image categorization discriminative dictionary learning
其他题名Generic Visual Object Categorization Method and System
学位专业控制理论与控制工程
中文摘要随着数码相机、摄像头、超高速扫描仪等各种图像获取设备的广泛应用以及互联网的迅猛发展,数码图像的数量呈指数级增长。如何让计算机自动理解图像的内容并使其具备人类的视觉分类能力,是计算机视觉领域研究的热点。本文针对通用视觉对象分类这一问题,从分类系统的设计,特征提取和变换,分类器的融合以及词典的训练等方面展开研究,主要内容包括: 第一,针对通用视觉对象分类问题,提出并设计实现了一种基于多特征融合的视觉对象分类系统。该系统基于Bag-of-Features模型,采用两种检测子和五种描述子组合构成多种局部特征,并采用空间金字塔划分加入结构信息以得到多通道的Bag-of-Features直方图特征,然后再将此特征进行自适应的二值化特征变换,最终将变换后的多通道特征通过核函数加以融合,提高视觉对象分类的准确率。本文将此系统应用于国际视觉对象分类竞赛The PASCAL Visual Object Classes Challenge 2009 (VOC2009),取得了令人满意的结果。 第二,视觉对象分类的一个重要应用就是根据图像的语义对图像进行分类和管理,其中场景图像的分类是最常见的问题之一,本文针对场景图像的分类问题提出了一种融合结构特征和纹理特征的场景图像分类方法。采用两级分类器对场景图像进行分类,第一级分类器利用全局结构信息得到候选类别,并通过分类结果判定相似类别对,第二级分类器则利用局部纹理信息区分相似类别,采用分类器的级联综合利用场景图像的整体结构信息和局部纹理信息,实验表明该方法能够做到不同场景类别鲁棒分类,相似场景类别有效区分,在15类场景图像的分类中达到了目前已知的最好分类准确率。 第三,在Bag-of-Features模型中,词典的学习尤为重要,传统的方法一般都是基于重构误差最小的准则去训练词典,为了提高Bag-of-Features特征的判别性,本文提出一种有判别性的稀疏表示词典学习方法。与其它的加入判别信息的学习方法不同,针对Bag-of-Features的特点,本文不仅利用图像块的类别信息而且通过加入整体图像的Fisher判别信息提高词典的判别性。实验表明,该方法比传统的词典学习方法具有更好的分类性能。
英文摘要With the development of image capturing devices such as digital cameras, video recorders and scanners, the amount of digital images increases explosively. A person can recognize thousands of categories of objects, however, it is difficult for a computer to achieve this level of performance. More and more researchers are involved in this direction. The following research has been conducted on system design, feature extraction and transformation, classifier fusion and dictionary learning: First of all, a muti-feature fusion system for visual object categorization has been proposed. The system is based on Bag-of-Features model. Two detectors and five descriptors are combined to capture the local features. In order to incorporate structure information, spatial pyramid matching is used. Then each channel of feature is transformed by adaptive binarized data transformation. Finally, all the channels are fused by extended Gaussian kernel. The performance of the system is evaluated by The PASCAL Visual Object Classes Challenge 2009 (VOC2009). In addition, scene classification is one of the most important applications of visual object categorization. A scene image categorization method is proposed based on structure and texture fusion. Structure information is used as the input of the first classifier and a few pairs of the similar categories are computed based on the results of first classification. The second classifier makes use of texture information to distinguish the similar categories. The experiment demonstrates that the proposed method has achieved the state-of-the-art results. Finally, dictionary learning is important to the performance of the Bag-of-Features model. Most of the existed methods are constructed based on the reconstruction error. For the sake of discriminant capacity of local features, a discriminative dictionary learning approach is introduced. A discrimination measure inspired by linear discriminant analysis is incorporated into the traditional dictionary learning and experiments have proved the validation of the method.
语种中文
其他标识符200718014628031
内容类型学位论文
源URL[http://ir.ia.ac.cn/handle/173211/6259]  
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
程刚. 通用视觉对象分类方法与系统[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2010.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace