【24h】

A novel approach to improve the Record Linkage process

机译:一种改善纪录联动过程的新方法

获取原文

摘要

Organizations around the world lose trillions of dollars due to poor data quality problems. In the last years, the awareness about the importance of data quality made the stakeholders invest a lot of money in order to improve the quality of the stored data. One of the main processes in data quality field is the Record Linkage process (RL). Record linkage is the process of identifying the tuples that refer to the same real world entity. Without blocking, the RL process can end up by billions of comparisons when dealing with large datasets. Blocking reduces the number of comparisons by dividing the data into blocks in a way that only the records in the same block will be compared to each other. In this paper, we propose a novel approach in the record linkage field based on the K-Modes algorithm as a blocking step, K-Modes propose a major advantage because the algorithm deals directly with the categorical data. The Obtained results showed that our proposition stands as a powerful approach in the record linkage field where it outperforms most of the approaches that exist in the literature.
机译:由于数据质量差的问题,世界各地的组织丧失了万亿美元。在过去的几年里,对数据质量重要性的认识使利益相关者投入了大量资金,以提高存储数据的质量。数据质量字段中的主要进程之一是记录链接过程(RL)。 Record Linkage是识别引用同一真实世界实体的元组的过程。在不阻止的情况下,在处理大型数据集时,RL过程最终可以通过数十亿比较结束。阻止通过将数据除以块来减少比较的数量,以至于仅将相同块中的记录彼此相互比较。在本文中,我们提出了一种基于K-Modes算法的记录链接场中的新方法作为阻塞步骤,K-Modes提出了一个主要优点,因为算法直接与分类数据进行讨论。获得的结果表明,我们的命题在记录联动领域中作为一种强大的方法,其中它优于文献中存在的大部分方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号