首页> 外文期刊>ACM transactions on knowledge discovery from data >ProgressER: Adaptive Progressive Approach to Relational Entity Resolution
【24h】

ProgressER: Adaptive Progressive Approach to Relational Entity Resolution

机译:ProgressER:关系实体解析的自适应渐进方法

获取原文
获取原文并翻译 | 示例

摘要

Entity resolution (ER) is the process of identifying which entities in a dataset refer to the same real-world object. In relational ER, the dataset consists of multiple entity-sets and relationships among them. Such relationships cause the resolution of some entities to influence the resolution of other entities. For instance, consider a relational dataset that consists of a set of research paper entities and a set of venue entities. In such a dataset, deciding that two research papers are the same may trigger the fact that their venues are also the same. This article proposes a progressive approach to relational ER, named ProgressER, that aims to produce the highest quality result given a constraint on the resolution budget, specified by the user. Such a progressive approach is useful for many emerging analytical applications that require low latency response (and thus cannot tolerate delays caused by cleaning the entire dataset) and/or in situations where the underlying resources are constrained or costly to use. To maximize the quality of the result, ProgressER follows an adaptive strategy that periodically monitors and reassesses the resolution progress to determine which parts of the dataset should be resolved next and how they should be resolved. More specifically ProgressER divides the input budget into several resolution windows and analyzes the resolution progress at the beginning of each window to generate a resolution plan for the current window. A resolution plan specifies which blocks of entities and which entity pairs within blocks need to be resolved during the plan execution phase of that window. In addition, ProgressER specifies, for each identified pair of entities, the order in which the similarity functions should be applied on the pair. Such an order plays a significant role in reducing the overall cost because applying the first few functions in this order might be sufficient to resolve the pair. The empirical evaluation of ProgressER demonstrates its significant advantage in terms of progressiveness over the traditional ER techniques for the given problem settings.
机译:实体解析(ER)是识别数据集中哪些实体引用同一真实世界对象的过程。在关系ER中,数据集由多个实体集及其之间的关系组成。这种关系导致某些实体的分辨率影响其他实体的分辨率。例如,考虑一个由一组研究论文实体和一组场所实体组成的关系数据集。在这样的数据集中,确定两个研究论文相同可能会触发一个事实,即它们的地点也相同。本文提出了一种名为ER的渐进式关系ER方法,该方法旨在在用户指定的分辨率预算受限的情况下产生最高质量的结果。对于需要低等待时间响应(因此不能容忍由于清理整个数据集而导致的延迟)的许多新兴分析应用程序和/或在底层资源受限或使用成本高昂的情况下,这种渐进方法很有用。为了最大程度地提高结果的质量,ProgressER遵循一种自适应策略,该策略定期监视并重新评估分辨率进度,以确定下一步应解决数据集的哪些部分以及如何解决它们。更具体地说,ProgressER将输入预算划分为多个分辨率窗口,并在每个窗口的开头分析分辨率进度,以生成当前窗口的分辨率计划。解析计划指定在该窗口的计划执行阶段需要解析哪些实体块以及哪些实体对。另外,ProgressER为每个标识的实体对指定相似性函数应应用于该对的顺序。这样的顺序在降低总成本中起着重要作用,因为按此顺序应用前几个功能可能足以解决该对问题。对于给定的问题设置,ProgressER的经验评估证明了它在进步性方面优于传统ER技术的显着优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号