Scalable Scheduling of Updates in Streaming Data Warehouses

Golab L.

首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Scalable Scheduling of Updates in Streaming Data Warehouses

【24h】

Scalable Scheduling of Updates in Streaming Data Warehouses

机译：流数据仓库中更新的可伸缩计划

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We discuss update scheduling in streaming data warehouses, which combine the features of traditional data warehouses and data stream systems. In our setting, external sources push append-only data streams into the warehouse with a wide range of interarrival times. While traditional data warehouses are typically refreshed during downtimes, streaming warehouses are updated as new data arrive. We model the streaming warehouse update problem as a scheduling problem, where jobs correspond to processes that load new data into tables, and whose objective is to minimize data staleness over time (at time t, if a table has been updated with information up to some earlier time r, its staleness is t minus r). We then propose a scheduling framework that handles the complications encountered by a stream warehouse: view hierarchies and priorities, data consistency, inability to preempt updates, heterogeneity of update jobs caused by different interarrival times and data volumes among different sources, and transient overload. A novel feature of our framework is that scheduling decisions do not depend on properties of update jobs (such as deadlines), but rather on the effect of update jobs on data staleness. Finally, we present a suite of update scheduling algorithms and extensive simulation experiments to map out factors which affect their performance.

机译：我们讨论了流数据仓库中的更新调度，它结合了传统数据仓库和数据流系统的功能。在我们的设置中，外部源将具有广泛到达时间的仅追加数据流推入仓库。传统的数据仓库通常在停机期间进行刷新，而流数据仓库则在新数据到达时进行更新。我们将流式仓库更新问题建模为调度问题，其中作业对应于将新数据加载到表中的进程，并且其目标是最大程度地减少一段时间内的数据陈旧性（在时间t，如果表已被更新了一些信息）时间r较早，其陈旧时间为t减去r）。然后，我们提出一个调度框架来处理流仓库遇到的复杂问题：查看层次结构和优先级，数据一致性，无法抢占更新，由于不同来源之间的不同到达时间和数据量而导致的更新作业的异构性以及瞬时过载。我们框架的一个新颖特征是，调度决策不取决于更新作业的属性（例如截止日期），而是取决于更新作业对数据陈旧性的影响。最后，我们提出了一套更新调度算法和广泛的仿真实验，以找出影响其性能的因素。

著录项

来源
《Knowledge and Data Engineering, IEEE Transactions on》 |2012年第6期|p.1092-1105|共14页
作者
Golab L.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Scheduling Effective Cloud Updates in Streaming Data Warehouses using RECSS Algorithm [J] . D. S. Misbha, J. R. Jeba International Journal of Applied Engineering Research . 2016,第5aPta7期

机译：使用RECSS算法在流数据仓库中安排有效的云更新
2. Efficient processing of streaming updates with archived master data in near-real-time data warehousing [J] . M. Asif Naeem, Gillian Dobbie, Gerald Weber Knowledge and information systems . 2014,第3期

机译：在近实时数据仓库中高效处理带存档主数据的流更新
3. Meshing Streaming Updates with Persistent Data in an Active Data Warehouse [J] . Polyzotis N., Skiadopoulos S., Vassiliadis P., IEEE Transactions on Knowledge and Data Engineering . 2008,第7期

机译：在活动数据仓库中使用持久性数据对流式更新进行网格划分
4. Scheduling Updates in a Real-Time Stream Warehouse [C] . Golab, Lukasz, Johnson, Data Engineering, ICDE, 2009 IEEE 25th International Conference on . 2009

机译：安排实时流仓库中的更新
5. Data warehouse stream view update with multiple streaming. [D] . Ahamed, Jamal Uddin. 2005

机译：具有多个流的数据仓库流视图更新。
6. Architecting the Data Loading Process for an i2b2 Research Data Warehouse: Full Reload versus Incremental Updating [O] . Andrew R. Post, Miao Ai, Akshatha Kalsanka Pai, 2017

机译：为i2b2研究数据仓库设计数据加载过程：完全重载与增量更新
7. From Data Warehouses to Streaming Warehouses: A Survey on the Challenges for Real-Time Data Warehousing and Available Solutions [O] . Revathy S, Pg Scholar, Saravana Balaji. B 2014

机译：从数据仓库到流数据仓库：实时数据仓库和可用解决方案挑战的调查

Scalable Scheduling of Updates in Streaming Data Warehouses

摘要

著录项

相似文献

相关主题

期刊订阅