【24h】

Update Propagation in a Streaming Warehouse

机译:在流仓库中更新传播

获取原文

摘要

Streaming warehouses are used to monitor complex systems such as data centers, web site complexes, and world-wide networks, gathering and correlating rich collections of events and measurements. Ideally, a streaming warehouse provides both historical data, for deep analysis, and real-time data for rapid response to emerging opportunities or problems. The highly temporal nature of the data and the need to support parallel processing naturally leads to extensive use of horizontal partitioning to manage base tables and layers of materialized views. In this paper, we consider the problem of determining when to propagate updates from base tables to dependent views on a partition-wise basis using autonomous updates. We provide a correctness theory for propagating updates to materialized views, simple algorithms which correctly propagate updates, and examples of algorithms which do not. We extend these results to accommodate needs of production warehouses: repartitioning of tables, mutual consistency, and merge tables. We measure the update propagation delays incurred by two different update propagation algorithms in test and production DataDepot warehouses, and find that only those update propagation algorithms which impose no scheduling restrictions are acceptable for use in a real-time streaming warehouse.
机译:流媒体仓库用于监控复杂的系统,如数据中心,网站复合体和全球网络,收集和关联丰富的事件和测量集。理想情况下,流媒体仓库提供历史数据,用于深入分析和实时数据,以便快速响应新兴机会或问题。数据的高度时间性和支持并行处理的需要自然地导致大量使用水平分区来管理基础表和物化视图层。在本文中,我们考虑使用自主更新在分区基础上将更新从基本表传播到依赖视图的问题。我们提供了一种正确性理论,用于传播到物化视图的更新,正确传播更新的简单算法,以及算法的示例。我们扩展了这些结果,以适应生产仓库的需求:表格,相互一致性和合并表的重新分配。我们测量测试和生产DataDepot仓库中的两个不同更新传播算法所产生的更新传播延迟,并发现只有那些施加任何不安排限制的更新传播算法可以在实时流仓库中使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号