首页> 外文期刊>Theory of computing systems >Scheduling to Minimize Stateness and Stretch in Real-Time Data Warehouses
【24h】

Scheduling to Minimize Stateness and Stretch in Real-Time Data Warehouses

机译:计划以最小化实时数据仓库中的状态和扩展

获取原文
获取原文并翻译 | 示例

摘要

We study scheduling algorithms for loading data feeds into real time data warehouses, which are used in applications such as IP network monitoring, online financial trading, and credit card fraud detection. In these applications, the warehouse collects a large number of streaming data feeds that are generated by external sources and arrive asynchronously. Data for each table in the warehouse are generated at a constant rate, different tables possibly at different rates. For each data feed, the arrival of new data triggers an update that seeks to append the new data to the corresponding table; if multiple updates are pending for the same table, they are batched together before being loaded. At time r, if a table has been updated with information up to time r ≤ τ, its staleness is defined as τ - r. Our first objective is to schedule the updates on one or more processors in a way that minimizes the total staleness. In order to ensure fairness, our second objective is to limit the maximum "stretch", which we define (roughly) as the ratio between the duration of time an update waits till it is finished being processed, and the length of the update. In contrast to earlier work proving the nonexistence of constant-competitive algorithms for related scheduling problems, we prove that any online nonpreemptive algorithm, no processor of which is ever voluntarily idle, incurs a staleness at most a constant factor larger than an obvious lower bound on total staleness (provided that the processors are sufficiently fast). We give a constant-stretch algorithm, provided that the processors are sufficiently fast, for the quasiperiodic model, in which tables can be clustered into a few groups such that the update frequencies within each group vary by at most a constant factor. Finally, we show that our constant-stretch algorithm is also constant-competitive (subject to the same proviso on processor speed) in the quasiperiodic model with respect to total weighted staleness, where tables are assigned weights that reflect their priorities.
机译:我们研究了用于将数据提要加载到实时数据仓库中的调度算法,该算法用于IP网络监控,在线金融交易和信用卡欺诈检测等应用中。在这些应用程序中,仓库收集了大量由外部源生成并异步到达的流数据提要。仓库中每个表的数据均以恒定的速率生成,不同的表可能以不同的速率生成。对于每个数据馈送,新数据的到达都会触发一个更新,该更新试图将新数据附加到相应的表中。如果同一表有多个待处理的更新,则在加载前将它们分批处理。在时间r处,如果某个表已更新了直到时间r≤τ的信息,则其陈旧性定义为τ-r。我们的第一个目标是以最大程度地减少陈旧性的方式在一个或多个处理器上安排更新。为了确保公平,我们的第二个目标是限制最大“伸展”,我们将其(大致)定义为更新等待完成为止的持续时间与更新长度之间的比率。与先前的工作证明不变竞争算法不存在相关调度问题的工作形成对比,我们证明,任何在线非抢占算法(其处理器都没有永远处于空闲状态)最多会导致陈旧性大于一个明显的下限。完全过时(前提是处理器足够快)。对于拟周期模型,如果处理器足够快,我们将给出一个恒定拉伸算法,在该模型中,表可以聚集成几组,每组中的更新频率最多变化一个恒定因子。最后,我们证明了在总周期陈旧性方面,拟周期模型中的恒定拉伸算法也是恒定竞争的(服从处理器速度的相同条件),其中表分配了反映其优先级的权重。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号