首页> 外文期刊>Data & Knowledge Engineering >An unsupervised blocking technique for more efficient record linkage
【24h】

An unsupervised blocking technique for more efficient record linkage

机译:一种无监督的阻塞技术,可实现更高效的记录链接

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Record linkage, referred to also as entity resolution, is the process of identifying pairs of records representing the same real-world entity (for example, a person) within a dataset or across multiple datasets. This allows for the integration of multi-source data which allows for better knowledge discovery. In order to reduce the number of record comparisons, record linkage frameworks initially perform a process commonly referred to as blocking, which involves separating records into blocks using a partition (or blocking) scheme. This restricts comparisons among records that belong to the same block during the linkage process. Existing blocking techniques often require some form of manual fine-tuning of parameter values for optimal performance. Optimal parameter values may be selected manually by a domain expert, or automatically learned using labelled data. However, in many real world situations no such labelled dataset may be available. In this paper we propose a novel unsupervised blocking technique for structured datasets that does not require labelled data or manual fine-tuning of parameters. Experimental evaluations, across a large number of datasets, demonstrate that this novel approach often achieves superior levels of proficiency to both supervised and unsupervised baseline techniques, often in less time.
机译:记录链接,也称为实体解析,是标识代表数据集内或多个数据集内的同一真实世界实体(例如,一个人)的记录对的过程。这允许集成多源数据,从而可以更好地发现知识。为了减少记录比较的数量,记录链接框架最初执行通常称为阻塞的过程,该过程涉及使用分区(或阻塞)方案将记录分成块。这限制了链接过程中属于同一块的记录之间的比较。现有的阻塞技术通常需要某种形式的参数值手动微调才能获得最佳性能。最佳参数值可以由领域专家手动选择,也可以使用标记数据自动学习。但是,在许多现实情况下,可能没有此类标记的数据集。在本文中,我们为结构化数据集提出了一种新颖的无监督阻塞技术,该技术不需要标记数据或参数的手动微调。对大量数据集进行的实验评估表明,这种新颖的方法通常可以在更短的时间内达到有监督和无监督基线技术更高的熟练水平。

著录项

  • 来源
    《Data & Knowledge Engineering》 |2019年第7期|181-195|共15页
  • 作者单位

    Queens Univ Belfast, Sch Elect Elect Engn & Comp Sci, Comp Sci Bldg,18 Malone Rd, Belfast BT9 5BN, Antrim, North Ireland;

    Queens Univ Belfast, Sch Elect Elect Engn & Comp Sci, Comp Sci Bldg,18 Malone Rd, Belfast BT9 5BN, Antrim, North Ireland;

    Univ Utrecht, Buys Ballotgebouw, NL-3584 CC Utrecht, Netherlands;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Unsupervised blocking; Record linkage; Entity resolution;

    机译:无人监督的阻塞;记录联动;实体分辨率;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号