首页> 外文会议>2011 International Conference on Electrical Engineering and Informatics >Study of localized data cleansing process for ETL performance improvement in independent datamart
【24h】

Study of localized data cleansing process for ETL performance improvement in independent datamart

机译:关于独立Datamart中提高ETL性能的本地化数据清理过程的研究

获取原文

摘要

Datawarehouse practitioners are thinking that the biggest efforts to build datawarehouse nowadays lies on ETL processing. The complexity of workload ETL processing depend on various and hetegeroneous data source profile that will be collected. A study to decrease the workload of ETL processing in datawarehouse development stages has been developed. A new concept of localized data source cleansing has been proposed. The consideration of inconsistent, non formal, expected existing and duplicated data source in localized data source's profiles should be locally identified. It is expected that this consideration will lighten and shorten the ETL processing so the workload performance of ETL processing will be better.An investigation to the impact of localized and non localized heterogenious data cleansing has been done. Based on this investigation an automatic localised data cleansing and integration system has been defined. It is a cleansing processing for each data source profile which will be executed in the transactional data source site. It means this process will be done before the datawarehouse development stages. It is found that if the Automatic Data Cleansing process and Data Integrator process could be carry on sequentially then the ETL processing workload in datawarehouse development stages will decrease. It is proven that decreasing number of raw data through locally cleansing process became significant for data with lack of integrity constraint and lack of format data checking procedures.
机译:DatawareHouse从业者认为,现在建立DatawareHouse的最大努力在于ETL处理。 Workload ETL处理的复杂性取决于将收集的各种和Hetegeroneous数据源配置文件。已经开发了一种研究,减少数据纳瓦房开发阶段的ETL处理工作量。提出了本地化数据源清洁的新概念。应在本地识别出于不一致,非正式,预期的现有和重复数据源的审议,应在本地化数据源配置文件中。预计这一考虑将减轻并缩短ETL处理,因此ETL处理的工作量性能将更好。对本地化和非本地化的异常数据清洁的影响进行调查。基于这一调查,已经定义了自动局部数据清理和集成系统。它是将在事务数据源站点中执行的每个数据源简档的清洁处理。这意味着在Datawarehouse开发阶段之前将完成此过程。发现,如果可以顺序地将自动数据清洁过程和数据集成器进程进行顺序进行,则数据在数据库开发阶段中的ETL处理工作量将减少。据证明,通过本地清理过程减少原始数据数量对于具有缺乏完整性约束的数据和缺乏格式数据检查程序的数据变得重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号