...
首页> 外文期刊>VLDB journal >Generic entity resolution with negative rules
【24h】

Generic entity resolution with negative rules

机译:具有否定规则的通用实体解析

获取原文
获取原文并翻译 | 示例
           

摘要

Entity resolution (ER) (also known as deduplication or merge-purge) is a process of identifying records that refer to the same real-world entity and merging them together. In practice, ER results may contain "inconsistencies," either due to mistakes by the match and merge function writers or changes in the application semantics. To remove the inconsistencies, we introduce "negative rules" that disallow inconsistencies in the ER solution (ER-N). A consistent solution is then derived based on the guidance from a domain expert. The inconsistencies can be resolved in several ways, leading to accurate solutions. We formalize ER-N, treating the match, merge, and negative rules as black boxes, which permits expressive and extensible ER-N solutions. We identify important properties for the rules that, if satisfied, enable less costly ER-N. We develop and evaluate two algorithms that find an ER-N solution based on guidance from the domain expert: the GNR algorithm that does not assume the properties and the ENR algorithm that exploits the properties.
机译:实体解析(ER)(也称为重复数据删除或合并清除)是一个过程,用于识别引用同一真实世界实体的记录并将它们合并在一起。在实践中,由于匹配和合并函数编写者的错误或应用程序语义的更改,ER结果可能包含“不一致”。为了消除不一致之处,我们引入了“否定规则”,以禁止ER解决方案(ER-N)中的不一致之处。然后,根据领域专家的指导得出一致的解决方案。可以通过多种方式解决不一致问题,从而得出准确的解决方案。我们将ER-N形式化,将匹配,合并和否定规则视为黑匣子,这允许表达和扩展的ER-N解决方案。我们确定了规则的重要属性,如果满足这些规则,则可以实现成本更低的ER-N。我们根据领域专家的指导开发和评估找到ER-N解决方案的两种算法:不具有属性的GNR算法和利用属性的ENR算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号