首页> 外文会议>Scientific and statistical database management >Update Propagation in a Streaming Warehouse
【24h】

Update Propagation in a Streaming Warehouse

机译:更新流仓库中的传播

获取原文
获取原文并翻译 | 示例

摘要

Streaming warehouses are used to monitor complex systems such as data centers, web site complexes, and world-wide networks, gathering and correlating rich collections of events and measurements. Ideally, a streaming warehouse provides both historical data, for deep analysis, and real-time data for rapid response to emerging opportunities or problems. The highly temporal nature of the data and the need to support parallel processing naturally leads to extensive use of horizontal partitioning to manage base tables and layers of materialized views. In this paper, we consider the problem of determining when to propagate updates from base tables to dependent views on a partition-wise basis using autonomous updates. We provide a correctness theory for propagating updates to materialized views, simple algorithms which correctly propagate updates, and examples of algorithms which do not. We extend these results to accommodate needs of production warehouses: repartitioning of tables, mutual consistency, and merge tables. We measure the update propagation delays incurred by two different update propagation algorithms in test and production DataDepot warehouses, and find that only those update propagation algorithms which impose no scheduling restrictions are acceptable for use in a real-time streaming warehouse.
机译:流数据仓库用于监视复杂的系统,例如数据中心,网站组合和全球网络,收集并关联事件和度量的丰富集合。理想情况下,流仓库既提供历史数据(用于深度分析),又提供实时数据,以快速响应出现的机会或问题。数据的高度临时性和支持并行处理的需求自然会导致水平分区的广泛使用,以管理基础表和实例化视图层。在本文中,我们考虑了使用自主更新来确定何时将更新从基本表传播到基于分区的依赖视图的问题。我们提供了用于传播对物化视图的更新的正确性理论,正确传播更新的简单算法以及没有传播这些算法的示例。我们扩展这些结果以适应生产仓库的需求:表的重新分区,相互一致性和合并表。我们测量了测试和生产DataDepot仓库中两种不同的更新传播算法引起的更新传播延迟,并发现只有那些没有调度限制的更新传播算法才可以在实时流仓库中使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号