首页> 外文期刊>Parallel Computing >Contention Awareness And Fault-tolerant Schedulingfor Precedence Constrained Tasks In Heterogeneous Systems
【24h】

Contention Awareness And Fault-tolerant Schedulingfor Precedence Constrained Tasks In Heterogeneous Systems

机译:异构系统中优先约束任务的竞争意识和容错调度

获取原文
获取原文并翻译 | 示例

摘要

Heterogeneous distributed systems are widely deployed for executing computationally intensive parallel applications with diverse computing needs. Such environments require effective scheduling strategies that take into account both algorithmic and architectural characteristics. Unfortunately, most of the scheduling algorithms developed for such systems rely on a simple platform model where communication contention is not taken into account. In addition, it is generally assumed that processors are completely safe. To schedule precedence graphs in a more realistic framework, we introduce first an efficient fault-tolerant scheduling algorithm that is both contention-aware and capable of supporting an arbitrary number of fail-silent (fail-stop) processor failures. Next, we derive a more complex heuristic that departs from the main principle of the first algorithm. Instead of considering a single task (one with highest priority) and assigning all its replicas to the currently best available resources, we consider a chunk of ready tasks, and assign all their replicas in the same decision making procedure. This leads to a better load balance of processors and communication links. We focus on a bi-criteria approach, where we aim at minimizing the total execution time, or latency, given a fixed number of failures supported in the system. Our algorithms have a low time complexity, and drastically reduce the number of additional communications induced by the replication mechanism. Experimental results fully demonstrate the usefulness of the proposed algorithms, which lead to efficient execution schemes while guaranteeing a prescribed level of fault-tolerance.
机译:异构分布式系统被广泛部署以执行具有各种计算需求的计算密集型并行应用程序。这样的环境需要考虑算法和架构特征的有效调度策略。不幸的是,为这种系统开发的大多数调度算法都依赖于一个简单的平台模型,其中没有考虑通信争用。另外,通常假定处理器是完全安全的。为了在更现实的框架中调度优先级图,我们首先介绍一种高效的容错调度算法,该算法既具有竞争意识,又能够支持任意数量的故障静默(故障停止)处理器故障。接下来,我们推导了一种更复杂的启发式方法,它与第一种算法的主要原理背道而驰。我们不考虑单个任务(具有最高优先级的任务)并将其所有副本分配给当前最佳可用资源,而是考虑大量就绪任务,并在同一决策过程中分配其所有副本。这样可以更好地平衡处理器和通信链路的负载。我们专注于双标准方法,在给定的系统支持的固定故障数量下,我们旨在最大程度地减少总执行时间或延迟。我们的算法具有较低的时间复杂度,并大大减少了由复制机制引起的附加通信的数量。实验结果充分证明了所提出算法的有效性,从而在保证规定的容错水平的同时,提供了有效的执行方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号