Instant-On Scientific Data Warehouses Lazy ETL for Data-Intensive Research

机译：即时科学数据仓库惰性ETL，用于数据密集型研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the dawn of the data intensive research era, scientific discovery deploys data analysis techniques similar to those that drive business intelligence. Similar to classical Extract, Transform and Load (ETL) processes, data is loaded entirely from external data sources (repositories) into a scientific data warehouse before it can be analyzed. This process is both, time and resource intensive and may not be entirely necessary if only a subset of the data is of interest to a particular user. To overcome this problem, we propose a novel technique to lower the costs for data loading: Lazy ETL. Data is extracted and loaded transparently on-the-fly only for the required data items. Extensive experiments demonstrate the significant reduction of the time from source data availability to query answer compared to state-of-the-art solutions. In addition to reducing the costs for bootstrapping a scientific data warehouse, our approach also reduces the costs for loading new incoming data.

机译：在数据密集型研究时代的曙光中，科学发现部署了与驱动商业智能类似的数据分析技术。与经典的提取，转换和加载（ETL）过程类似，在分析数据之前，将数据完全从外部数据源（存储库）加载到科学数据仓库中。该过程既耗时又耗费资源，并且如果特定用户只对数据的一个子集感兴趣，则可能不是完全必要的。为了克服这个问题，我们提出了一种新颖的技术来降低数据加载的成本：惰性ETL。仅针对所需的数据项，透明地实时提取和加载数据。大量实验表明，与最新解决方案相比，从源数据可用性到查询答案的时间显着减少。除了减少引导科学数据仓库的成本外，我们的方法还降低了加载新传入数据的成本。

著录项

来源
《International workshop on enabling real-time business intelligence;International conference on very large databases》|2013年|60-75|共16页
会议地点
作者
Yagiz Kargin; Holger Pirk; Milena Ivanova; Stefan Manegold; Martin Kersten;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Improving data-intensive EDA performance with annotation-driven laziness [J] . Quirino Zagarese, Gerardo Canfora, Eugenio Zimeo, Science of Computer Programming . 2015,第pta2期

机译：通过注释驱动的惰性提高数据密集型EDA性能
2. Efficient location-aware data placement for data-intensive applications in geo-distributed scientific data centers [J] . Jinghui Zhang, Jian Chen, Junzhou Luo, Tsinghua Science and Technology . 2016,第5期

机译：地理分布科学数据中心中用于数据密集型应用程序的高效位置感知数据放置
3. Efficient Location-Aware Data Placement for Data-Intensive Applications in Geo-distributed Scientific Data Centers [J] . Jinghui Zhang, Jian Chen, Junzhou Luo, 清华大学学报（英文版） . 2016,第005期

机译：地理分布科学数据中心中用于数据密集型应用程序的有效位置感知数据放置
4. Instant-On Scientific Data Warehouses Lazy ETL for Data-Intensive Research [C] . Yagiz Kargin, Holger Pirk, Milena Ivanova, International workshop on enabling real-time business intelligence . 2013

机译：即时科学数据仓库懒惰的ETL用于数据密集型研究
5. Spatio-Temporal Data Warehousing for Exploratory Analysis of Scientific Data [D] . Zhao Jing, 趙菁 2019

机译：用于科学数据探索性分析的时空数据仓库
6. Parameterized Specification Configuration and Execution of Data-Intensive Scientific Workflows [O] . Vijay S. Kumar, Tahsin Kurc, Varun Ratnakar, -1

机译：数据密集型科学工作流程的参数化规范配置和执行
7. Efficient location-aware data placement for data-intensive applications in geo-distributed scientific data centers [O] . Jinghui Zhang, Junzhou Luo, Aibo Song 2016

机译：地理分布式科学数据中心中的数据密集型应用的有效地点感知数据放置

Instant-On Scientific Data Warehouses Lazy ETL for Data-Intensive Research

摘要

著录项

相似文献

相关主题

期刊订阅