【24h】

Instant-On Scientific Data Warehouses Lazy ETL for Data-Intensive Research

机译:即时科学数据仓库惰性ETL,用于数据密集型研究

获取原文

摘要

In the dawn of the data intensive research era, scientific discovery deploys data analysis techniques similar to those that drive business intelligence. Similar to classical Extract, Transform and Load (ETL) processes, data is loaded entirely from external data sources (repositories) into a scientific data warehouse before it can be analyzed. This process is both, time and resource intensive and may not be entirely necessary if only a subset of the data is of interest to a particular user. To overcome this problem, we propose a novel technique to lower the costs for data loading: Lazy ETL. Data is extracted and loaded transparently on-the-fly only for the required data items. Extensive experiments demonstrate the significant reduction of the time from source data availability to query answer compared to state-of-the-art solutions. In addition to reducing the costs for bootstrapping a scientific data warehouse, our approach also reduces the costs for loading new incoming data.
机译:在数据密集型研究时代的曙光中,科学发现部署了与驱动商业智能类似的数据分析技术。与经典的提取,转换和加载(ETL)过程类似,在分析数据之前,将数据完全从外部数据源(存储库)加载到科学数据仓库中。该过程既耗时又耗费资源,并且如果特定用户只对数据的一个子集感兴趣,则可能不是完全必要的。为了克服这个问题,我们提出了一种新颖的技术来降低数据加载的成本:惰性ETL。仅针对所需的数据项,透明地实时提取和加载数据。大量实验表明,与最新解决方案相比,从源数据可用性到查询答案的时间显着减少。除了减少引导科学数据仓库的成本外,我们的方法还降低了加载新传入数据的成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号