A holistic solution for duplicate entity identification in deep web data integration

CORC > 北京大学 > 计算机科学技术研究所

	A holistic solution for duplicate entity identification in deep web data integration
	Liu, Wei ; Meng, Xiaofeng
	2010
英文摘要	The proliferation of deep Web offers users a great opportunity to search high-quality information from Web. As a necessary step in deep Web data integration, the goal of duplicate entity identification is to discover the duplicate records from the integrated Web databases for further applications(e.g. price-comparison services). However, most of existing works address this issue only between two data sources, which are not practical to deep Web data integration systems. That is, one duplicate entity matcher trained over two specific Web databases cannot be applied to other Web databases. In addition, the cost of preparing the training set for n Web databases is C-n2 times higher than that for two Web databases. In this paper, we propose a holistic solution to address the new challenges posed by deep Web, whose goal is to build one duplicate entity matcher over multiple Web databases. The extensive experiments on two domains show that the proposed solution is highly effective for deep Web data integration. ? 2010 IEEE.; EI; 0
语种	英语
DOI标识	10.1109/SKG.2010.38
内容类型	其他
源URL	[http://ir.pku.edu.cn/handle/20.500.11897/321564]
专题	计算机科学技术研究所
推荐引用方式 GB/T 7714	Liu, Wei,Meng, Xiaofeng. A holistic solution for duplicate entity identification in deep web data integration. 2010-01-01.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

暂无评论

评注功能仅针对注册用户开放，请您登录

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接