首页> 外文会议>International conference on very large data bases >NADEEF: A Generalized Data Cleaning System
【24h】

NADEEF: A Generalized Data Cleaning System

机译:Nadeef:广义数据清洁系统

获取原文

摘要

We present NADEEF, an extensible, generic and easy-to-deploy data cleaning system. NADEEF distinguishes between a programming interface and a core to achieve generality and extensibility. The programming interface allows users to specify data quality rules by writing code that implements predefined classes. These classes uniformly define what is wrong with the data and (possibly) how to fix it. We will demonstrate the following features provided by NADEEF. (1) Heterogeneity: The programming interface can be used to express many types of data quality rules beyond the well known CFDs (FDs), MDs and ETL rules. (2) Interdependency: The core algorithms can interleave multiple types of rules to detect and repair data errors. (3) Deployment and extensibility: Users can easily customize NADEEF by defining new types of rules, or by extending the core. (4) Metadata management and data custodians: We show a live data quality dashboard to effectively involve users in the data cleaning process.
机译:我们提供Nadeef,一个可扩展,通用且易于部署的数据清洁系统。 Nadeef区分编程接口和核心以实现普遍性和可扩展性。编程接口允许用户通过编写实现预定义类的代码来指定数据质量规则。这些课程统一定义数据和(可能)如何修复它的错误。我们将演示Nadeef提供的以下功能。 (1)异质性:编程接口可用于表达众所周知的CFD(FDS),MDS和ETL规则之外的许多类型的数据质量规则。 (2)相互依赖性:核心算法可以交错多种类型的规则来检测和修复数据错误。 (3)部署和扩展性:用户可以通过定义新类型的规则或扩展核心来轻松自定义nadeef。 (4)元数据管理和数据托管人:我们显示实时数据质量仪表板,以有效地涉及用户在数据清洁过程中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号