首页> 外文会议>IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing >DieHard: Reliable Scheduling to Survive Correlated Failures in Cloud Data Centers
【24h】

DieHard: Reliable Scheduling to Survive Correlated Failures in Cloud Data Centers

机译:DieHard:可靠的计划可以在云数据中心中解决相关的故障

获取原文

摘要

In large scale data centers, a single fault can lead to correlated failures of several physical machines and the tasks running on them, simultaneously. Such correlated failures can severely damage the reliability of a service or a job. This paper models the impact of stochastic and correlated failures on job reliability in a data center. We focus on correlated failures caused by power outages or failures of network components, on jobs running multiple replicas of identical tasks. We present a statistical reliability model and an approximation technique for computing a job's reliability in the presence of correlated failures. In addition, we address the problem of scheduling a job with reliability constraints. We formulate the scheduling problem as an optimization problem, with the aim being to achieve the desired reliability with the minimum number of extra tasks. We present a scheduling algorithm that approximates the minimum number of required tasks and a placement to achieve a desired job reliability. We study the efficiency of our algorithm using an analytical approach and by simulating a cluster with different failure sources and reliabilities. The results show that the algorithm can effectively approximate the minimum number of extra tasks required to achieve the job's reliability.
机译:在大型数据中心中,单个故障可能同时导致多个物理机及其上运行的任务的相关故障。这种相关的故障会严重损害服务或工作的可靠性。本文模拟了随机故障和相关故障对数据中心工作可靠性的影响。我们专注于因断电或网络组件故障而导致的相关故障,针对运行相同任务的多个副本的作业。我们提出了统计可靠性模型和一种近似技术,用于在存在相关故障的情况下计算作业的可靠性。另外,我们解决了在可靠性约束下安排作业的问题。我们将调度问题表述为优化问题,目的是通过最少数量的额外任务来实现所需的可靠性。我们提出了一种调度算法,该算法近似于所需任务的最小数量和实现所需工作可靠性的位置。我们使用一种分析方法并通过模拟具有不同故障源和可靠性的集群来研究算法的效率。结果表明,该算法可以有效地逼近实现作业可靠性所需的最少额外任务数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号