基于统计学和语义信息的中文文本主题识别技术

CORC > 清华大学

	基于统计学和语义信息的中文文本主题识别技术
	冯晋 ; 李春平 ; FENG Jin ; LI ChunpingSchool
	2010-06-09 ; 2010-06-09
关键词	信息抽取中文关键词提取关联分析文本挖掘 extraction association analysis text mining TP391.1
其他题名	Topic detection technology for Chinese text based on statistics and semantic information
中文摘要	由于中文分词处理的复杂性在一定程度上限制了中文信息抽取技术的发展,因此,快速有效地抽取中文文本主题的需求越来越突出。该文主要通过中文分词技术、频繁词查找和词性组合计算来分析词与词之间的关联并最终提取出能够表达文章内容的主题词汇,同时还对这些词汇作了记分和排序。读者能够通过这些词汇来判定文章的主题和重要内容。通过对人民日报语料进行实验表明,该方法正确率能够保持在66%以上,同时对于网页邮件等真实文档也有较好的测试结果。; Requirements for extracting main information from Chinese texts sharply stand out because the complexity of Chinese word segments have partly restricted the development of Chinese information retrieval. A novel extraction method is proposed in this paper. The new method extracts the keywords and phrases expressing the main idea of text by using Chinese words segmentation, frequent searched words, and the parts of speech computation. Moreover, scoring and ordering of these extracted words are also given. The experiments on People's Dairy Corpus and some real texts such as webs, emails, etc. were made. The results show that the accuracy of this approach can exceed 66% on the People's Dairy Corpus. Meanwhile, it also has a good result on real texts.Key words: information retrieval; Chinese; 国家“八六三”高技术项目(2002AA444120)
语种	中文 ; 中文
内容类型	期刊论文
源URL	[http://hdl.handle.net/123456789/56551]
专题	清华大学
推荐引用方式 GB/T 7714	冯晋,李春平,FENG Jin,等. 基于统计学和语义信息的中文文本主题识别技术[J],2010, 2010.
APA	冯晋,李春平,FENG Jin,&LI ChunpingSchool.(2010).基于统计学和语义信息的中文文本主题识别技术..
MLA	冯晋,et al."基于统计学和语义信息的中文文本主题识别技术".(2010).

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们