首页> 外文会议>International conference on very large data bases >Comparative Analysis of Approximate Blocking Techniques for Entity Resolution
【24h】

Comparative Analysis of Approximate Blocking Techniques for Entity Resolution

机译:实体分辨率近似阻塞技术的比较分析

获取原文

摘要

Entity Resolution is a core task for merging data collections. Due to its quadratic complexity, it typically scales to large volumes of data through blocking: similar entities are clustered into blocks and pair-wise comparisons are executed only between co-occurring entities, at the cost of some missed matches. There are numerous blocking methods, and the aim of this work is to offer a comprehensive empirical survey, extending the dimensions of comparison beyond what is commonly available in the literature. We consider 17 state-of-the-art blocking methods and use 6 popular real datasets to examine the robustness of their internal configurations and their relative balance between effectiveness and time efficiency. We also investigate their scalability over a corpus of 7 established synthetic datasets that range from 10,000 to 2 million entities.
机译:实体解析是合并数据集合的核心任务。由于其二次复杂性,它通常通过阻塞扩展到大量数据:将相似的实体聚类为块,并且仅在同时出现的实体之间执行成对比较,但会丢失一些匹配项。有许多种阻止方法,这项工作的目的是提供全面的实证调查,将比较的范围扩展到文献中通常无法提供的范围之外。我们考虑了17种最新的阻塞方法,并使用6个流行的真实数据集来检查其内部配置的稳健性以及它们在有效性和时间效率之间的相对平衡。我们还研究了它们在7个已建立的综合数据集(从10,000到200万个实体)范围内的可伸缩性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号