首页> 外文会议>International conference on information systems >How Clean is Clean Enough? Determining the Most Effective Use of Resources in the Data Cleansing Process
【24h】

How Clean is Clean Enough? Determining the Most Effective Use of Resources in the Data Cleansing Process

机译:清洁足够干净吗?确定数据清理过程中最有效的资源使用

获取原文

摘要

Poor data quality can have a significant impact on system and organizational performance. With significant increase in data gathering and storage, the number of sources of data that must be merged in data warehouse and Enterprise Resource Planning (ERP) implementations has increased significantly. This makes data cleansing as part of the implementation conversion, increasingly difficult. In this research we expand the traditional Extraction-Load-Transform (ETL) process to identify sub-processes between the main stages. We then identify the decisions and tradeoffs related to the various decisions on allocation of time, resources and accuracy constraints on the data cleansing process. We develop a mathematical model of the process to identify the optimal configuration of these factors in data cleansing process. We use empirical data to test the feasibly of the proposed model. Multiple domain experts validate the range of constraints used for model testing. Three different levels of cleansing complexity are tested in the preliminary analysis to demonstrate the use and validity of the modeling process.
机译:不良的数据质量可能会对系统和组织的绩效产生重大影响。随着数据收集和存储的大量增加,必须在数据仓库和企业资源计划(ERP)实施中合并的数据源数量已大大增加。这使得数据清理作为实现转换的一部分越来越困难。在这项研究中,我们扩展了传统的提取-加载-转换(ETL)过程,以识别主要阶段之间的子过程。然后,我们确定与数据清理过程中的时间,资源分配和准确性约束的各种决策相关的决策和权衡。我们开发了该过程的数学模型,以识别数据清理过程中这些因素的最佳配置。我们使用经验数据来检验所提出模型的可行性。多个领域专家会验证用于模型测试的约束范围。在初步分析中测试了三种不同级别的清洗复杂性,以证明建模过程的使用和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号