首页> 外文会议>International conference on information systems >How Clean is Clean Enough? Determining the Most Effective Use of Resources in the Data Cleansing Process
【24h】

How Clean is Clean Enough? Determining the Most Effective Use of Resources in the Data Cleansing Process

机译:干净干净多么干净?确定数据清洁过程中最有效地使用资源

获取原文

摘要

Poor data quality can have a significant impact on system and organizational performance. With significant increase in data gathering and storage, the number of sources of data that must be merged in data warehouse and Enterprise Resource Planning (ERP) implementations has increased significantly. This makes data cleansing as part of the implementation conversion, increasingly difficult. In this research we expand the traditional Extraction-Load-Transform (ETL) process to identify sub-processes between the main stages. We then identify the decisions and tradeoffs related to the various decisions on allocation of time, resources and accuracy constraints on the data cleansing process. We develop a mathematical model of the process to identify the optimal configuration of these factors in data cleansing process. We use empirical data to test the feasibly of the proposed model. Multiple domain experts validate the range of constraints used for model testing. Three different levels of cleansing complexity are tested in the preliminary analysis to demonstrate the use and validity of the modeling process.
机译:差的数据质量可能对系统和组织绩效产生重大影响。随着数据收集和存储的显着增加,必须在数据仓库和企业资源规划(ERP)实施中必须合并的数据源数量显着增加。这使得数据清理为实现转换的一部分,越来越困难。在本研究中,我们扩展了传统的提取 - 负载变换(ETL)过程,以识别主阶段之间的子过程。然后,我们确定与数据清理过程中的时间,资源和准确限制的各种决定相关的决定和权衡。我们开发了一个过程的数学模型,以确定数据清理过程中这些因素的最佳配置。我们使用经验数据来测试所提出的模型。多个域专家验证用于模型测试的约束范围。在初步分析中测试了三种不同水平的清洁复杂性,以证明建模过程的使用和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号