A framework for detecting unnecessary industrial data in ETL processes

机译：用于检测ETL流程中不必要的工业数据的框架

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Extract transform and load (ETL) is a critical process used by industrial organisations to shift data from one database to another, such as from an operational system to a data warehouse. With the increasing amount of data stored by industrial organisations, some ETL processes can take in excess of 12 hours to complete; this can leave decision makers stranded while they wait for the data needed to support their decisions. After designing the ETL processes, inevitably data requirements can change, and much of the data that goes through the ETL process may not ever be used or needed. This paper therefore proposes a framework for dynamically detecting and predicting unnecessary data and preventing it from slowing down ETL processes ??? either by removing it entirely or deprioritizing it. Other advantages of the framework include being able to prioritise data cleansing tasks and determining what data should be processed first and placed into fast access memory. We show existing example algorithms that can be used for each component of the framework, and present some initial testing results as part of our research to determine whether the framework can help to reduce ETL time.

机译：提取转换和加载（ETL）是工业组织用于将数据从一个数据库转移到另一个数据库（例如从操作系统到数据仓库）的关键过程。随着工业组织存储的数据量不断增加，某些ETL流程可能要花费12个小时以上才能完成;这可能会使决策者在等待支持决策所需的数据时陷入困境。在设计ETL流程之后，不可避免地会改变数据需求，并且可能永远不会使用或不需要通过ETL流程的许多数据。因此，本文提出了一种用于动态检测和预测不必要数据并防止其减慢ETL进程速度的框架。通过完全删除它或降低其优先级。该框架的其他优点包括能够确定数据清理任务的优先级，并确定应先处理哪些数据并将其放入快速访问内存中。我们展示了可用于框架每个组件的现有示例算法，并提出了一些初步测试结果作为我们研究的一部分，以确定框架是否可以帮助减少ETL时间。

著录项

来源
《IEEE International Conference on Industrial Informatics》|2014年|472-476|共5页
会议地点
作者
Woodall Philip; Jess Torben; Harrison Mark; McFarlane Duncan; Shah Amar; Krechel William; Nicks Eric;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Data mining; Data models; Data warehouses; Educational institutions; Predictive models; Sensitivity analysis; Transforms; Data warehouse; ETL; Extract transform and load; data overload; detecting unnecessary data; reduce ETL; unnecessary data;

机译：数据挖掘;数据模型;数据仓库;教育机构;预测模型;敏感性分析;变换;数据仓库; ETL;提取变换和负载;数据过载;检测不必要的数据;减少ETL;不必要的数据;

相似文献

外文文献
中文文献
专利

1. A robust multi-kernel change detection framework for detecting leaf beetle defoliation using Landsat 7 ETM+ data [J] . Anees Asim, Aryal Jagannath, OReilly Malgorzata M., ISPRS Journal of Photogrammetry and Remote Sensing . 2016,第deca期

机译：使用Landsat 7 ETM +数据检测叶甲虫脱叶的强大多核变化检测框架
2. XML based Framework for ETL Processes For Relational Databases [J] . TASSAWAR IQBAL, NADEEM DAUDPOTA WSEAS Transactions on Information Science and Applications . 2006,第7期

机译：基于XML的关系数据库ETL流程框架
3. A Deep Supervised Learning Framework for Data-Driven Soft Sensor Modeling of Industrial Processes [J] . Yuan Xiaofeng, Gu Yongjie, Wang Yalin, Neural Networks and Learning Systems, IEEE Transactions on . 2020,第11期

机译：用于工业流程的数据驱动软传感器建模的深度监督学习框架
4. A Framework Study of ETL Processes Optimization Based on Metadata Repository [C] . Lunan Li 2010 2nd International Conference on Computer Engineering and Technology.;vol. 1. . 2010

机译：基于元数据存储库的ETL过程优化框架研究
5. Development of data mining techniques in industrial processes: Modelling of steel manufacturing processes [D] . Gonzalez Marcos, Ana 2006

机译：工业过程中数据挖掘技术的发展：钢铁制造过程的建模
6. Using the Prostate Imaging Reporting and Data System version 2 (PI-RIDS v2) to detect prostate cancer can prevent unnecessary biopsies and invasive treatment [O] . Chang Liu, Shi-Liang Liu, Zhi-Xian Wang, 2018

机译：使用前列腺成像报告和数据系统版本2（PI-RIDS v2）来检测前列腺癌可以防止不必要的活检和侵入性治疗
7. A framework for detecting unnecessary industrial data in ETL processes [O] . Woodall Philip Mark, Jess Torben, Harrison Mark, 2014

机译：用于检测ETL流程中不必要的工业数据的框架

A framework for detecting unnecessary industrial data in ETL processes

摘要

著录项

相似文献

相关主题

期刊订阅