首页> 外文期刊>Data & Knowledge Engineering >An unsupervised blocking technique for more efficient record linkage
【24h】

An unsupervised blocking technique for more efficient record linkage

机译:一种更有效的记录联动的无监督阻塞技术

获取原文
获取原文并翻译 | 示例

摘要

Record linkage, referred to also as entity resolution, is the process of identifying pairs of records representing the same real-world entity (for example, a person) within a dataset or across multiple datasets. This allows for the integration of multi-source data which allows for better knowledge discovery. In order to reduce the number of record comparisons, record linkage frameworks initially perform a process commonly referred to as blocking, which involves separating records into blocks using a partition (or blocking) scheme. This restricts comparisons among records that belong to the same block during the linkage process. Existing blocking techniques often require some form of manual fine-tuning of parameter values for optimal performance. Optimal parameter values may be selected manually by a domain expert, or automatically learned using labelled data. However, in many real world situations no such labelled dataset may be available. In this paper we propose a novel unsupervised blocking technique for structured datasets that does not require labelled data or manual fine-tuning of parameters. Experimental evaluations, across a large number of datasets, demonstrate that this novel approach often achieves superior levels of proficiency to both supervised and unsupervised baseline techniques, often in less time.
机译:记录链接,也称为实体分辨率,是识别代表数据集中或跨多个数据集的相同实体(例如,一个人)的记录对的过程。这允许集成多源数据,该数据允许更好的知识发现。为了减少记录比较的数量,Record Lays框架最初执行通常称为阻塞的过程,其涉及使用分区(或阻塞)方案将记录分离为块。这限制了在链接过程中属于同一块的记录之间的比较。现有的阻塞技术通常需要某种形式的手动微调参数值以获得最佳性能。可以由域专家手动选择最佳参数值,或者使用标记的数据自动学习。但是,在许多真实世界的情况下,没有这种标记的数据集可以可用。在本文中,我们提出了一种用于结构化数据集的新型无监督阻塞技术,其不需要标记数据或手动微调参数。在大量数据集中的实验评估表明,这种新颖的方法往往达到了较高的熟练程度,往往在更短的时间内往往达到监督和无监督的基线技术。

著录项

  • 来源
    《Data & Knowledge Engineering》 |2019年第7期|181-195|共15页
  • 作者单位

    Queens Univ Belfast Sch Elect Elect Engn & Comp Sci Comp Sci Bldg 18 Malone Rd Belfast BT9 5BN Antrim North Ireland;

    Queens Univ Belfast Sch Elect Elect Engn & Comp Sci Comp Sci Bldg 18 Malone Rd Belfast BT9 5BN Antrim North Ireland;

    Univ Utrecht Buys Ballotgebouw NL-3584 CC Utrecht Netherlands;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Unsupervised blocking; Record linkage; Entity resolution;

    机译:无人监督的阻塞;记录联动;实体分辨率;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号