首页> 外文会议>International conference on scalable uncertainty management >ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution
【24h】

ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

机译:ERBlox:将匹配依赖项与机器学习相结合以实现实体解析

获取原文

摘要

Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called matching dependencies (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating three components of ER: (a) Classifiers for duplicateon-duplicate record pairs built using machine learning (ML) techniques, (b) MDs for supporting both the blocking phase of ML and the merge itself; and (c) The use of the declarative language LogiQL -an extended form of Datalog supported by the LogicBlox platform- for data processing, and the specification and enforcement of MDs.
机译:实体解析(ER)是一个重要且常见的数据清理问题,它是关于检测同一外部实体的数据重复表示并将其合并为单个表示。相对最近,已经提出了称为匹配相关性(MD)的声明性规则,用于指定相似条件,在该条件下数据库记录中的属性值将被合并。在这项工作中,我们展示了集成ER的三个组件的过程和好处:(a)使用机器学习(ML)技术构建的重复/非重复记录对的分类器,(b)支持ML的阻塞阶段的MD和合并本身; (c)使用声明性语言LogiQL(由LogicBlox平台支持的Datalog的扩展形式)进行数据处理以及MD的规范和实施。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号