首页> 外文期刊>SIGKDD explorations >Development and User Experiences of an Open Source Data Cleaning, Deduplication and Record Linkage System
【24h】

Development and User Experiences of an Open Source Data Cleaning, Deduplication and Record Linkage System

机译:开源数据清除,重复数据删除和记录链接系统的开发和用户体验

获取原文
获取原文并翻译 | 示例
           

摘要

Record linkage, also known as database matching or entity resolution, is now recognised as a core step in the KDD process. Data mining projects increasingly require that information from several sources is combined before the actual mining can be conducted. Also of increasing interest is the deduplication of a single database. The objectives of record linkage and deduplication are to identify, match and merge all records that relate to the same real-world entities. Because real-world data is commonly 'dirty', data cleaning is an important first step in many deduplication, record linkage, and data mining projects. In this paper, an overview of the Febrl (Freely Extensible Biomedical Record Linkage) system is provided, and the results of a recent survey of Febrl users is discussed. Febrl includes a variety of functionalities required for data cleaning, deduplication and record linkage, and it provides a graphical user interface that facilitates its application for users who do not have programming experience.
机译:记录链接,也称为数据库匹配或实体解析,现在被认为是KDD流程中的核心步骤。数据挖掘项目越来越多地要求将来自多个来源的信息进行合并,然后才能进行实际的挖掘。对单个数据库的重复数据删除也引起了越来越多的关注。记录链接和重复数据删除的目标是识别,匹配和合并与同一真实世界实体相关的所有记录。由于实际数据通常是“脏”的,因此数据清理是许多重复数据删除,记录链接和数据挖掘项目中重要的第一步。在本文中,提供了Febrl(可自由扩展的生物医学记录链接)系统的概述,并讨论了对Febrl用户的最新调查结果。 Febrl包括数据清理,重复数据删除和记录链接所需的各种功能,并且它提供了图形用户界面,可为没有编程经验的用户提供便利的应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号