Entity Resolution with Recursive Blocking

Yu Shao-Qing

首页> 外文期刊>Big Data Research >Entity Resolution with Recursive Blocking

【24h】

Entity Resolution with Recursive Blocking

机译：具有递归阻塞的实体分辨率

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Entity resolution is a well-known challenge in data management for the lack of unique identifiers of records and various errors hidden in the data, undermining the identifiability of entities they refer to. To reveal matching records, every record potentially needs to be compared with all other records in the database, which is computationally intractable even for moderately-sized databases. To circumvent this quadratic challenge, blocking methods are typically employed to facilitate restricting promising comparisons of pairs within small subsets, called blocks, of records. Existing effective methods typically rely on blocking keys created by experts to capture matches, which inevitably involves a large amount of human labor and do not guarantee high-quality results. To reduce manual labor and promote accuracy, machine learning approaches are investigated to meet the challenge with limited success, due to high requirements of training data and inefficiency, especially for large databases. The exhaustive method produces exact results but suffers from efficiency problems. In this paper, we propose a paradigm of divide-and-conquer entity resolution, named recursive blocking, which derives comparatively good results while largely alleviating efficiency concerns. Specifically, recursive blocking refines blocks and traps matches in an iterative fashion to derive high-quality results, and we study two types of recursive blocking, i.e. redundancy- and partition-based approaches, and investigate their relative performance. Comprehensive experiments on both real-world and synthetic datasets verified the superiority of our approaches over the existing ones. (C) 2020 Elsevier Inc. All rights reserved.

机译：实体分辨率是数据管理中缺乏唯一标识符的挑战，以及数据中隐藏的各种错误，破坏了他们所指的实体的可识别性。为了揭示匹配记录，每个记录都需要与数据库中的所有其他记录进行比较，即使对于中等大小的数据库，也是计算地难以解决的。为了避免这种二次挑战，通常采用阻断方法来促进限制对记录的小亚集合中的对对的有希望的对的比较。现有的有效方法通常依赖于专家创建的阻塞密钥来捕获匹配，这不可避免地涉及大量人工，并不保证高质量的结果。为了减少体力劳动和促进准确性，由于培训数据和效率低廉的要求，研究了机器学习方法，以满足有限的成功挑战，特别是对于大型数据库。详尽的方法产生确切的结果，但遭受了效率问题。在本文中，我们提出了一种分裂和征服实体分辨率的范例，命名递归阻断，其导致相对良好的结果，同时在很大程度上减轻了效率问题。具体而言，递归阻塞精制块和陷阱以迭代方式匹配，以获得高质量的结果，并研究两种类型的递归阻塞，即冗余和基于分区的方法，并调查它们的相对性能。关于现实世界和合成数据集的综合实验验证了我们对现有的方法的优势。（c）2020 Elsevier Inc.保留所有权利。

著录项

来源
《Big Data Research 》 |2020年第1期| 共17页
作者
Yu Shao-Qing;
展开▼
作者单位

Peking Univ Sch EECS Key Lab High Confidence Software Technol MOE Beijing Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术 ;
关键词
Entity resolution; Record linkage; Data integration; Big data;

机译：实体分辨率;记录链接;数据集成;大数据;

相似文献

外文文献
中文文献
专利

1. A Survey on Blocking Technology of Entity Resolution [J] . Bo-Han Li, Yi Liu, An-Man Zhang, 计算机科学技术学报（英文版） . 2020 ,第004期
2. Modeling Topic-Based Human Expertise for Crowd Entity Resolution [J] . Sai-Sai Gong, Wei Hu, Wei-Yi Ge, 计算机科学技术学报（英文版） . 2018 ,第006期
3. Exploiting block co-occurrence to control block sizes for entity resolution [J] . Nascimento Dimas Cassimiro, Pires Carlos Eduardo Santos, Mestre Demetrio Gomes Knowledge and information systems . 2020 ,第1期

机译：利用块共同运行到实体分辨率的控制块大小
4. Blocking and Filtering Techniques for Entity Resolution: A Survey [J] . Papadakis George, Skoutas Dimitrios, Thanos Emmanouil, ACM Computing Surveys . 2021 ,第2期

机译：实体解析的阻止和过滤技术：调查
5. Unsupervised learning blocking keys technique for indexing Arabic entity resolution [J] . Marwah Alian, Arafat Awajan, Bandan Ramadan International journal of speech technology . 2019 ,第3期

机译：用于索引阿拉伯实体分辨率的无监督学习阻止键技术
6. Landmarks-based Blocking Method For Large-scale Entity Resolution [C] . Samudra Herath, Matthew Roughan, Gary Glonek IEEE International Conference on Data Science and Advanced Analytics . 2020

机译：基于地标的大规模实体分解方法
7. Matrix factorization using a block-recursive structure and block-recursive algorithms. [D] . Frens, Jeremy David. 2002

机译：使用块递归结构和块递归算法进行矩阵分解。
8. Deeply Recursive Low- and High-Frequency Fusing Networks for Single Image Super-Resolution [O] . Cheng Yang, Guanming Lu 2020

机译：深度递归低和高频熔断网络用于单图像超分辨率
9. Incremental Blocking for Entity Resolution over Web Streaming Data [O] . Tiago Brasileiro Araújo, Kostas Stefanidis, Carlos Eduardo Santos Pires, 2019

机译：在Web流数据上的实体分辨率增量阻断
10. Monitoring Entities in an Uncertain World: Entity Resolution and Referential Integrity. [R] . C. A. Knoblock K. See P. LaMonica S. A. Macskassy S. N. Minton 2011

机译：监测不确定世界中的实体：实体解决方案和参考完整性。

Entity Resolution with Recursive Blocking

摘要

著录项

相似文献

相关主题

期刊订阅