Update Propagation in a Streaming Warehouse

机译：更新流仓库中的传播

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Streaming warehouses are used to monitor complex systems such as data centers, web site complexes, and world-wide networks, gathering and correlating rich collections of events and measurements. Ideally, a streaming warehouse provides both historical data, for deep analysis, and real-time data for rapid response to emerging opportunities or problems. The highly temporal nature of the data and the need to support parallel processing naturally leads to extensive use of horizontal partitioning to manage base tables and layers of materialized views. In this paper, we consider the problem of determining when to propagate updates from base tables to dependent views on a partition-wise basis using autonomous updates. We provide a correctness theory for propagating updates to materialized views, simple algorithms which correctly propagate updates, and examples of algorithms which do not. We extend these results to accommodate needs of production warehouses: repartitioning of tables, mutual consistency, and merge tables. We measure the update propagation delays incurred by two different update propagation algorithms in test and production DataDepot warehouses, and find that only those update propagation algorithms which impose no scheduling restrictions are acceptable for use in a real-time streaming warehouse.

机译：流数据仓库用于监视复杂的系统，例如数据中心，网站组合和全球网络，收集并关联事件和度量的丰富集合。理想情况下，流仓库既提供历史数据（用于深度分析），又提供实时数据，以快速响应出现的机会或问题。数据的高度临时性和支持并行处理的需求自然会导致水平分区的广泛使用，以管理基础表和实例化视图层。在本文中，我们考虑了使用自主更新来确定何时将更新从基本表传播到基于分区的依赖视图的问题。我们提供了用于传播对物化视图的更新的正确性理论，正确传播更新的简单算法以及没有传播这些算法的示例。我们扩展这些结果以适应生产仓库的需求：表的重新分区，相互一致性和合并表。我们测量了测试和生产DataDepot仓库中两种不同的更新传播算法引起的更新传播延迟，并发现只有那些没有调度限制的更新传播算法才可以在实时流仓库中使用。

著录项

来源
《Scientific and statistical database management》|2011年|p.129-149|共21页
会议地点 Portland OR(US);Portland OR(US)
作者
Theodore Johnson; Vladislav Shkapenyuk;
展开▼
作者单位

ATT Labs - Research;

ATT Labs - Research;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类 TP311.13;
关键词

相似文献

外文文献
中文文献
专利

1. Scheduling Effective Cloud Updates in Streaming Data Warehouses using RECSS Algorithm [J] . D. S. Misbha, J. R. Jeba International Journal of Applied Engineering Research . 2016,第5aPta7期

机译：使用RECSS算法在流数据仓库中安排有效的云更新
2. Efficient processing of streaming updates with archived master data in near-real-time data warehousing [J] . M. Asif Naeem, Gillian Dobbie, Gerald Weber Knowledge and information systems . 2014,第3期

机译：在近实时数据仓库中高效处理带存档主数据的流更新
3. Scalable Scheduling of Updates in Streaming Data Warehouses [J] . Golab L. Knowledge and Data Engineering, IEEE Transactions on . 2012,第6期

机译：流数据仓库中更新的可伸缩计划
4. Update Propagation in a Streaming Warehouse [C] . Theodore Johnson, Vladislav Shkapenyuk International Conference on Scientific and Statistical Database Management . 2011

机译：在流仓库中更新传播
5. Data warehouse stream view update with multiple streaming. [D] . Ahamed, Jamal Uddin. 2005

机译：具有多个流的数据仓库流视图更新。
6. Architecting the Data Loading Process for an i2b2 Research Data Warehouse: Full Reload versus Incremental Updating [O] . Andrew R. Post, Miao Ai, Akshatha Kalsanka Pai, 2017

机译：为i2b2研究数据仓库设计数据加载过程：完全重载与增量更新
7. MESHJOIN*:An Algorithm Supporting Streaming Updates in a Real-time Data Warehouse [O] . 林子雨, 林琛, 冯少荣, 2010

机译：MESHJOIN *：一种支持实时数据仓库中流更新的算法

Update Propagation in a Streaming Warehouse

摘要

著录项

相似文献

相关主题

期刊订阅