CORC  > 清华大学
基于统计学和语义信息的中文文本主题识别技术
冯晋 ; 李春平 ; FENG Jin ; LI ChunpingSchool
2010-06-09 ; 2010-06-09
关键词信息抽取 中文关键词提取 关联分析 文本挖掘 extraction association analysis text mining TP391.1
其他题名Topic detection technology for Chinese text based on statistics and semantic information
中文摘要由于中文分词处理的复杂性在一定程度上限制了中文信息抽取技术的发展,因此,快速有效地抽取中文文本主题的需求越来越突出。该文主要通过中文分词技术、频繁词查找和词性组合计算来分析词与词之间的关联并最终提取出能够表达文章内容的主题词汇,同时还对这些词汇作了记分和排序。读者能够通过这些词汇来判定文章的主题和重要内容。通过对人民日报语料进行实验表明,该方法正确率能够保持在66%以上,同时对于网页邮件等真实文档也有较好的测试结果。; Requirements for extracting main information from Chinese texts sharply stand out because the complexity of Chinese word segments have partly restricted the development of Chinese information retrieval. A novel extraction method is proposed in this paper. The new method extracts the keywords and phrases expressing the main idea of text by using Chinese words segmentation, frequent searched words, and the parts of speech computation. Moreover, scoring and ordering of these extracted words are also given. The experiments on People's Dairy Corpus and some real texts such as webs, emails, etc. were made. The results show that the accuracy of this approach can exceed 66% on the People's Dairy Corpus. Meanwhile, it also has a good result on real texts.Key words: information retrieval; Chinese; 国家“八六三”高技术项目(2002AA444120)
语种中文 ; 中文
内容类型期刊论文
源URL[http://hdl.handle.net/123456789/56551]  
专题清华大学
推荐引用方式
GB/T 7714
冯晋,李春平,FENG Jin,等. 基于统计学和语义信息的中文文本主题识别技术[J],2010, 2010.
APA 冯晋,李春平,FENG Jin,&LI ChunpingSchool.(2010).基于统计学和语义信息的中文文本主题识别技术..
MLA 冯晋,et al."基于统计学和语义信息的中文文本主题识别技术".(2010).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace