首页> 外文会议>Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on >Task Scheduling Algorithm for Multicore Processor System for Minimizing Recovery Time in Case of Single Node Fault
【24h】

Task Scheduling Algorithm for Multicore Processor System for Minimizing Recovery Time in Case of Single Node Fault

机译:多核处理器系统的任务调度算法,可在单节点故障情况下将恢复时间降至最低

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we propose a task scheduling algorithm for a multicore processor system which reduces the recovery time in case of a single fail-stop failure of a multicore processor. Many of the recently developed processors have multiple cores on a single die, so that one failure of a computing node results in failure of many processors. In the case of a failure of a multicore processor, all tasks which have been executed on the failed multicore processor have to be recovered at once. The proposed algorithm is based on an existing check pointing technique, and we assume that the state is saved when nodes send results to the next node. If a series of computations that depends on former results is executed on a single die, we need to execute all parts of the series of computations again in the case of failure of the processor. The proposed scheduling algorithm tries not to concentrate tasks to processors on a die. We designed our algorithm as a parallel algorithm that achieves O(n) speedup where n is the number of processors. We evaluated our method using simulations and experiments with four PCs. We compared our method with existing scheduling method, and in the simulation, the execution time including recovery time in the case of a node failure is reduced by up to 50% while the overhead in the case of no failure was a few percent in typical scenarios.
机译:在本文中,我们提出了一种用于多核处理器系统的任务调度算法,该算法可在多核处理器发生单次故障停止故障时减少恢复时间。许多最近开发的处理器在单个裸片上具有多个内核,因此计算节点的一次故障会导致许多处理器发生故障。在多核处理器出现故障的情况下,必须立即恢复在发生故障的多核处理器上执行的所有任务。所提出的算法基于现有的检查指向技术,并且我们假设当节点将结果发送到下一个节点时,状态已保存。如果在单个裸片上执行取决于先前结果的一系列计算,则在处理器出现故障的情况下,我们需要再次执行一系列计算的所有部分。所提出的调度算法试图不将任务集中在芯片上的处理器上。我们将算法设计为可实现O(n)加速的并行算法,其中n是处理器数量。我们使用四台PC进行仿真和实验,评估了我们的方法。我们将我们的方法与现有的调度方法进行了比较,在仿真中,在典型情况下,节点故障情况下的执行时间(包括恢复时间)最多减少了50%,而无故障情况下的开销则降低了百分之几。 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号