CORC  > 北京大学  > 信息科学技术学院
Optimize document identifier assignment for inverted index compression
Chen, Chong ; He, Jing ; Shan, Dongdong ; Yan, Hongfei
刊名journal of computational information systems
2010
英文摘要Document identifier assignment is a technique for inverted file index compression, by reducing d-gap value of posting lists. It was approached by either TSP or clustering methods in existing study. However, there is no proper formulation for this problem and the existing approaches has no theory guarantee to be good approximations. In this paper, we first formulate document identifier assignment problem as an optimization problem, and then propose a new method to solve it approximately. Our method first clusters the documents by URL information and then rearranges the documents and clusters with benefit function, which is derived by minimizing posting space directly. TSP method can be considered as one simple case of our method. The experiments show that it achieves a good trade-off between efficiency and effectiveness. ? 2010 Binary Information Press.; EI; 0; 2; 339-346; 6
语种英语
内容类型期刊论文
源URL[http://ir.pku.edu.cn/handle/20.500.11897/327690]  
专题信息科学技术学院
推荐引用方式
GB/T 7714
Chen, Chong,He, Jing,Shan, Dongdong,et al. Optimize document identifier assignment for inverted index compression[J]. journal of computational information systems,2010.
APA Chen, Chong,He, Jing,Shan, Dongdong,&Yan, Hongfei.(2010).Optimize document identifier assignment for inverted index compression.journal of computational information systems.
MLA Chen, Chong,et al."Optimize document identifier assignment for inverted index compression".journal of computational information systems (2010).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace