CORC  > 清华大学
Select strong information features to improve text categorization effectiveness
Dejun Xue ; Maosong Sun
2010-05-06 ; 2010-05-06
关键词Practical Theoretical or Mathematical/ feature extraction information theory pattern classification text analysis/ strong information features text categorization high feature dimensionality weak information features constrained information gain measure feature selection measure Chi measure class centroid based classifier Chinese character bigram features Chinese documents large scale document collection/ C7240 Information analysis and indexing C1250 Pattern recognition C1260 Information theory
中文摘要High feature dimensionality is one of the main obstacles in text categorization (TC). This paper focused on the solution of feature selection to reduce feature dimensionality in TC. We first classified the original features into three types according to their contributions to categorization, including strong information features, weak information features, and irrelevant features. Then, we put forward the constrained information gain (CIG) measure that preferred to low-frequency informative features for categorization by ignoring negative evidence in classic IG measure. By concentrating on the first type of feature, we further proposed a novel feature selection measure, Chi-CIG, by combining Chi and CIG measures. Based on class-centroid-based classifier and Chinese character bigram features, a TC system for Chinese documents was designed. Experimental results on a large-scale document collection (71,674 documents) indicated that Chi-CIG measure set up a more effective feature set for categorization than did classic Chi and IG measures.
语种英语 ; 英语
出版者Freund & Pettmen Publishers ; UK
内容类型期刊论文
源URL[http://hdl.handle.net/123456789/10064]  
专题清华大学
推荐引用方式
GB/T 7714
Dejun Xue,Maosong Sun. Select strong information features to improve text categorization effectiveness[J],2010, 2010.
APA Dejun Xue,&Maosong Sun.(2010).Select strong information features to improve text categorization effectiveness..
MLA Dejun Xue,et al."Select strong information features to improve text categorization effectiveness".(2010).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace