首页> 外文会议>Sixth International Conference on Semantics Knowledge and Grid >A Holistic Solution for Duplicate Entity Identification in Deep Web Data Integration
【24h】

A Holistic Solution for Duplicate Entity Identification in Deep Web Data Integration

机译:深度Web数据集成中重复实体识别的整体解决方案

获取原文

摘要

The proliferation of deep Web offers users a great opportunity to search high-quality information from Web. As a necessary step in deep Web data integration, the goal of duplicate entity identification is to discover the duplicate records from the integrated Web databases for further applications(e.g. price-comparison services). However, most of existing works address this issue only between two data sources, which are not practical to deep Web data integration systems. That is, one duplicate entity matcher trained over two specific Web databases cannot be applied to other Web databases. In addition, the cost of preparing the training set for n Web databases is C_n^2 times higher than that for two Web databases. In this paper, we propose a holistic solution to address the new challenges posed by deep Web, whose goal is to build one duplicate entity matcher over multiple Web databases. The extensive experiments on two domains show that the proposed solution is highly effective for deep Web data integration.
机译:深度Web的普及为用户提供了一个从Web搜索高质量信息的绝好机会。作为深度Web数据集成的必要步骤,重复实体标识的目的是从集成的Web数据库中发现重复记录以供进一步应用(例如价格比较服务)。但是,大多数现有工作仅在两个数据源之间解决此问题,这对于深度Web数据集成系统不切实际。即,在两个特定的Web数据库上训练的一个重复的实体匹配器不能应用于其他Web数据库。此外,为n个Web数据库准备训练集的成本比两个Web数据库的成本高C_n ^ 2倍。在本文中,我们提出了一种整体解决方案,以应对深层Web带来的新挑战,深层Web的目标是在多个Web数据库上构建一个重复的实体匹配器。在两个领域的大量实验表明,所提出的解决方案对于深度Web数据集成非常有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号