CORC  > 清华大学
基于伪反馈与分类的文本检索
王灿辉 ; 茹立云 ; 张敏 ; 马少平 ; Canhui Wang ; Liyun Ru ; Min Zhang ; Shaoping Ma
2010-07-15 ; 2010-07-15
会议名称全国第八届计算语言学联合学术会议(JSCL-2005)论文集 ; 全国第八届计算语言学联合学术会议(JSCL-2005) ; 中国南京 ; CNKI ; 南京师范大学、清华大学智能技术与系统国家重点实验室
关键词文本检索 伪反馈 分类 Rocchio方法 Text retrieval, pseudo-feedback, classification, Rocchio approach. TP391.3
其他题名Text Retrieval Based on Pseudo-Feedback and Classification
中文摘要查询空间与文档空间的不匹配是文本检索中的一大难题,在句子级别的检索中表现尤为突出。为解决这个问题而提出的查询扩展方法本身存在着难以解决的困扰。基于分类的方法绕过了这一难题,成为实现句子检索的一个可行方法。实际中使用分类方法碰到的一大难题是缺少正例数据。本文根据Rocchio方法利用反例从未标注数据中抽取出可能相关的文档.并采用文档长度进行加权,从抽取出的文档中挑选可信度较高的加入正例集,然后采用SVM分类,取得了比直接用查询进行检索更好的性能。本文还提出采用伪反馈的方法来补充正例,将用查询进行初次检索的结果经过Rocchio方法过滤后的结果视为正例,在此基础上用SVM进行分类,进一步提高了检索性能。; Term mismatch between a query and a document is a difficult problem in text retrieval, which is especially severe in sentence-level retrieval. Query expansion approaches proposed to solve this problem bring in troubles that are hard to eliminate. Classification methods become helpful in sentence-level retrieval, escaping from considering the term mismatch problem. The difficulty of applying classification approaches is due to lack of positive samples. In this paper, negative samples are used to help to extract possibly relevant documents from unlabeled data based on the Rocchio approach. Further, the extracted documents are weighted by their length and those with high credibility are regarded as positive samples. Experimental results demonstrate that SVM classification combined with the Rocchio approach achieves better performance than direct query-based retrieval. In addition, pseudo-feedback is utilized to complement positive samples. Documents retrieved in query-based retrieval are filtered by the Rocchio approach and added to the positive sample set. Experimental results show that SVM classification based on this pseudo-feedback approach can improve retrieval performance further.; 得到国家重点基础研究(973)(2004CB318108); 自然科学基金(60223004,60321002,60303005)资助。
会议录出版者清华大学出版社
语种中文 ; 中文
内容类型会议论文
源URL[http://hdl.handle.net/123456789/69956]  
专题清华大学
推荐引用方式
GB/T 7714
王灿辉,茹立云,张敏,等. 基于伪反馈与分类的文本检索[C]. 见:全国第八届计算语言学联合学术会议(JSCL-2005)论文集, 全国第八届计算语言学联合学术会议(JSCL-2005), 中国南京, CNKI, 南京师范大学、清华大学智能技术与系统国家重点实验室.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace