首页> 外文会议>ACM/IEEE on joint conference on digital libraries >Eliminating the Redundancy in Blocking-based Entity Resolution Methods
【24h】

Eliminating the Redundancy in Blocking-based Entity Resolution Methods

机译:在基于阻止的实体解析方法中消除冗余

获取原文

摘要

Entity resolution is the task of identifying entities that refer to the same real-world object. It has important applications in the context of digital libraries, such as citation matching and author disambiguation. Blocking is an established methodology for efficiently addressing this problem; it clusters similar entities together, and compares solely entities inside each cluster. In order to effectively deal with the current large, noisy and heterogeneous data collections, novel blocking methods that rely on redundancy have been introduced: they associate each entity with multiple blocks in order to increase recall, thus increasing the computational cost, as well. In this paper, we introduce novel techniques that remove the superfluous comparisons from any redundancy-based blocking method. They improve the time-efficiency of the latter without any impact on the end result. We present the optimal solution to this problem that discards all redundant comparisons at the cost of quadratic space complexity. For applications with space limitations, we also present an alternative, lightweight solution that operates at the abstract level of blocks in order to discard a significant part of the redundant comparisons. We evaluate our techniques on two large, real-world data sets and verify the significant improvements they convey when integrated into existing blocking methods.
机译:实体分辨率是识别引用相同真实世界对象的实体的任务。它在数字图书馆的背景下具有重要应用,例如引文匹配和作者歧义。阻止是一种有效解决这个问题的既定方法;它将类似的实体集群在一起,并在每个群集中单独进行比较。为了有效地处理当前的大,嘈杂和异构的数据收集,已经介绍了依赖冗余的新型阻塞方法:它们将每个实体与多个块相关联,以便增加召回,从而增加计算成本。在本文中,我们引入了从任何基于冗余的阻塞方法中消除多余比较的新技术。它们提高了后者的时间效率,而不会对最终结果产生任何影响。我们为此问题提供了最佳解决方案,该问题以二次空间复杂性成本丢弃所有冗余比较。对于具有空间限制的应用,我们还提供了一种在块的抽象级别运行的替代,轻量级解决方案,以丢弃冗余比较的很大一部分。我们在两个大型现实数据集上评估我们的技术,并在集成到现有的阻塞方法时验证它们传达的显着改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号