首页> 外文会议>IEEE International Conference on Industrial Informatics >A framework for detecting unnecessary industrial data in ETL processes
【24h】

A framework for detecting unnecessary industrial data in ETL processes

机译:用于检测ETL流程中不必要的工业数据的框架

获取原文

摘要

Extract transform and load (ETL) is a critical process used by industrial organisations to shift data from one database to another, such as from an operational system to a data warehouse. With the increasing amount of data stored by industrial organisations, some ETL processes can take in excess of 12 hours to complete; this can leave decision makers stranded while they wait for the data needed to support their decisions. After designing the ETL processes, inevitably data requirements can change, and much of the data that goes through the ETL process may not ever be used or needed. This paper therefore proposes a framework for dynamically detecting and predicting unnecessary data and preventing it from slowing down ETL processes ??? either by removing it entirely or deprioritizing it. Other advantages of the framework include being able to prioritise data cleansing tasks and determining what data should be processed first and placed into fast access memory. We show existing example algorithms that can be used for each component of the framework, and present some initial testing results as part of our research to determine whether the framework can help to reduce ETL time.
机译:提取转换和加载(ETL)是工业组织用于将数据从一个数据库转移到另一个数据库(例如从操作系统到数据仓库)的关键过程。随着工业组织存储的数据量不断增加,某些ETL流程可能要花费12个小时以上才能完成;这可能会使决策者在等待支持决策所需的数据时陷入困境。在设计ETL流程之后,不可避免地会改变数据需求,并且可能永远不会使用或不需要通过ETL流程的许多数据。因此,本文提出了一种用于动态检测和预测不必要数据并防止其减慢ETL进程速度的框架。通过完全删除它或降低其优先级。该框架的其他优点包括能够确定数据清理任务的优先级,并确定应先处理哪些数据并将其放入快速访问内存中。我们展示了可用于框架每个组件的现有示例算法,并提出了一些初步测试结果作为我们研究的一部分,以确定框架是否可以帮助减少ETL时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号