XDist: an effective XML keyword search system with re-ranking model based on keyword distribution | |
Gao Ning ; Deng ZhiHong ; Lu ShengLong | |
刊名 | science china information sciences |
2014 | |
关键词 | XML keywords search information retrieval ranking model keyword distribution evaluation TERM PROXIMITY |
DOI | 10.1007/s11432-012-4781-6 |
英文摘要 | Keyword search enables web users to easily access XML data without understanding the complex data schemas. However, the native ambiguity of keyword search makes it arduous to select qualified relevant results matching keywords. To solve this problem, researchers have made much effort on establishing ranking models distinguishing relevant and irrelevant passages, such as the highly cited TF*IDF and BM25. However, these statistic based ranking methods mostly consider term frequency, inverse document frequency and length as ranking factors, ignoring the distribution and connection information between different keywords. Hence, these widely used ranking methods are powerless on recognizing irrelevant results when they are with high term frequency, indicating a performance limitation. In this paper, a new searching system XDist is accordingly proposed to attack the problems aforementioned. In XDist, we firstly use the semantic query model maximal lowest common ancestor (MAXLCA) to recognize the returned results of a given query, and then these candidate results are ranked by BM25. Especially, XDist re-ranks the top several results by a combined distribution measurement (CDM) which considers four measure criterions: term proximity, intersection of keyword classes, degree of integration among keywords and quantity variance of keywords. The weights of the four measures in CDM are trained by a listwise learning to optimize method. The experimental results on the evaluation platform of INEX show that the re-ranking method CDM can effectively improve the performance of the baseline BM25 by 22% under iP[0.01] and 18% under MAiP. Also the semantic model MAXLCA and the search engine XDist perform the best in their respective related fields.; http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000334860600001&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=8e1609b174ce4e31116a60747a720701 ; Computer Science, Information Systems; SCI(E); 2; ARTICLE; zhdeng@cis.pku.edu.cn; 5; 57 |
语种 | 英语 |
内容类型 | 期刊论文 |
源URL | [http://ir.pku.edu.cn/handle/20.500.11897/152121] |
专题 | 信息科学技术学院 |
推荐引用方式 GB/T 7714 | Gao Ning,Deng ZhiHong,Lu ShengLong. XDist: an effective XML keyword search system with re-ranking model based on keyword distribution[J]. science china information sciences,2014. |
APA | Gao Ning,Deng ZhiHong,&Lu ShengLong.(2014).XDist: an effective XML keyword search system with re-ranking model based on keyword distribution.science china information sciences. |
MLA | Gao Ning,et al."XDist: an effective XML keyword search system with re-ranking model based on keyword distribution".science china information sciences (2014). |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论