首页> 外文期刊>Distributed and Parallel Databases >An effective weighted rule-based method for entity resolution
【24h】

An effective weighted rule-based method for entity resolution

机译:一种基于加权规则的有效实体分解方法

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Entity resolution is an important task in data cleaning to detect records that belong to the same entity. It has a critical impact on digital libraries where different entities share the same name without any identifier key. Conventional methods adopt similarity measures and clustering techniques to reveal the records of a specific entity. Due to the lack of performance, recent methods build rules on records’ attributes with distinct values for entities to overcome some drawbacks. However, they use inadequate attributes and ignore common and empty attributes values which affect the quality of entity resolution. In this paper, we define a multi-attributes weighted rule system (MAWR) that investigates all values of records’ attributes in order to represent the difficult record-entity mapping. Then, we propose a rule generation algorithm based on this system. We also propose an entity resolution algorithm (MAWR-ER) depending on the generated rules to identify entities. We verify our method on real data, and the experimental results prove the effectiveness and efficiency of our proposed method.
机译:实体解析是数据清理中检测属于同一实体的记录的重要任务。它对数字图书馆具有关键影响,在数字图书馆中,不同的实体共享相同的名称而没有任何标识符密钥。常规方法采用相似性度量和聚类技术来揭示特定实体的记录。由于缺乏性能,最近的方法在记录的属性上建立规则,为实体提供不同的值,以克服某些缺点。但是,它们使用的属性不足,并且会忽略影响实体解析质量的公共和空属性值。在本文中,我们定义了一个多属性加权规则系统(MAWR),该系统研究记录属性的所有值,以表示困难的记录实体映射。然后,提出了基于该系统的规则生成算法。我们还根据生成的规则提出一种实体解析算法(MAWR-ER)来识别实体。我们在真实数据上验证了我们的方法,实验结果证明了该方法的有效性和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号