CORC  > 清华大学
中文文本分类中基于概念屏蔽层的特征提取方法
廖莎莎 ; 江铭虎 ; LIAO Sha-sha ; JIANG Ming-hu
2010-06-07 ; 2010-06-07
关键词计算机应用 中文信息处理 文本分类 特征提取 概念抽取 属性特征树 屏蔽层 描述能力 computer application Chinese information processing text classification feature selection concept extraction concept tree shielded level description power TP391.1
其他题名A Feature Selection Method in Chinese Text Classification Based on Concept Extraction with a Shielded Level
中文摘要本文提出了一种新的基于概念抽取和屏蔽层的特征选择方法。该方法利用HowNet概念词典中的概念树,通过义原在概念树中的位置信息进行概念抽取,并赋予其适当权值来说明其描述能力。对于权值低于屏蔽层的义原,我们不将其选入特征集,并相应保留原词。具体到每个词,我们计算其DEF条目中的权值,决定是将原词选入特征集还是进行概念抽取。本文重点研究了如何给义原设定一个合适的权值,如何在选取原词和概念之间取得平衡以及针对非概念词的加权处理。实验证明,设定合适的屏蔽层,不仅可以缩小特征维数,使分类正确率得到一定的提高,而且可以减少不同类别间的分类正确率的差别。; In this paper,we propose a novel feature selection method based on concept extraction and shielded level.In this method,we use HowNet as the semantic dictionary to extract concept attributes.Based on their positions in the concept tree,the attributes will get proper weights,which present their description powers.A concept attribute will not be selected as feature if its weight is lower than the shielded level and the original word will be reserved for use.To each word,we calculate all the weights of the concept attributes in its DEF,and decide whether to extract the concept attributes or reserve the word.We focus mainly on how to weight the concept attributes,how to make a balance between concept features and word features,and how to treat the words out of the dictionary.The experiment shows that if a shielded level is set properly,it can not only reduce the feature dimension to a proper scale but also improve the classification precise.Moreover,it can reduce the difference of the classification precise among different categories.; 教育部优秀青年教师资助计划项目(2051); 中国科学院模式识别国家重点实验室开放课题基金(10); 2003年度清华大学985-Ⅰ期基础研究基金的资助
语种中文 ; 中文
内容类型期刊论文
源URL[http://hdl.handle.net/123456789/43227]  
专题清华大学
推荐引用方式
GB/T 7714
廖莎莎,江铭虎,LIAO Sha-sha,等. 中文文本分类中基于概念屏蔽层的特征提取方法[J],2010, 2010.
APA 廖莎莎,江铭虎,LIAO Sha-sha,&JIANG Ming-hu.(2010).中文文本分类中基于概念屏蔽层的特征提取方法..
MLA 廖莎莎,et al."中文文本分类中基于概念屏蔽层的特征提取方法".(2010).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace