【24h】

Pay-As-You-Go Entity Resolution

机译:现收现付实体解决方案

获取原文
获取原文并翻译 | 示例
           

摘要

Entity resolution (ER) is the problem of identifying which records in a database refer to the same entity. In practice, many applications need to resolve large data sets efficiently, but do not require the ER result to be exact. For example, people data from the web may simply be too large to completely resolve with a reasonable amount of work. As another example, real-time applications may not be able to tolerate any ER processing that takes longer than a certain amount of time. This paper investigates how we can maximize the progress of ER with a limited amount of work using “hints,” which give information on records that are likely to refer to the same real-world entity. A hint can be represented in various formats (e.g., a grouping of records based on their likelihood of matching), and ER can use this information as a guideline for which records to compare first. We introduce a family of techniques for constructing hints efficiently and techniques for using the hints to maximize the number of matching records identified using a limited amount of work. Using real data sets, we illustrate the potential gains of our pay-as-you-go approach compared to running ER without using hints.
机译:实体解析(ER)是识别数据库中的哪些记录引用同一实体的问题。实际上,许多应用程序需要有效地解析大型数据集,但并不要求ER结果准确。例如,来自网络的人员数据可能只是太大而无法通过合理的工作量完全解决。作为另一个示例,实时应用程序可能无法忍受比特定时间更长的任何ER处理。本文研究了如何使用“提示”在有限的工作量内最大限度地提高ER的进度,这些提示提供了可能引用同一真实世界实体的记录信息。提示可以以各种格式表示(例如,根据它们的匹配可能性对记录进行分组),ER可以将此信息用作首先比较记录的指南。我们介绍了一系列有效构造提示的技术,以及使用提示使使用有限工作量确定的匹配记录数最大化的技术。使用实际数据集,我们展示了与不使用提示而运行ER相比,按需付费方法的潜在收益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号