首页> 外文期刊>Parallel Computing >Distributed dynamic load balancing for pipelined computations on heterogeneous systems
【24h】

Distributed dynamic load balancing for pipelined computations on heterogeneous systems

机译:异构系统上流水线计算的分布式动态负载平衡

获取原文
获取原文并翻译 | 示例

摘要

One of the most significant causes for performance degradation of scientific and engineering applications on high performance computing systems is the uneven distribution of the computational work to the resources of the system. This effect, which is known as load imbalance, is even more noticeable in the case of irregular applications and heterogeneous distributed systems. This motivated the parallel and distributed computing research community to focus on methods that provide good load balancing for scientific and engineering applications running on (heterogeneous) distributed systems. Efficient load balancing and scheduling methods are employed for scientific applications from various fields, such as mechanics, materials, physics, chemistry, biology, applied mathematics, etc. Such applications typically employ a large number of computational methods in order to simulate complex phenomena, on very large scales of time and magnitude. These simulations consist of routines that perform repetitive computations (in the form of DO/FOR loops) over very large data sets, which, if not properly implemented and executed, may suffer from poor performance. The number of repetitive computations in the simulation codes is not always constant. Moreover, the computational nature of these simulations may be in fact irregular, leading to the case when one computation takes (unpredictably) more time than others. For successful and timely results, large scale simulations require the use of large scale computing systems, which often are widely distributed and highly heterogeneous. Moreover, large scale computing systems are usually shared among multiple users, which causes the quality and quantity of the available resources to be highly unpredictable. There are numerous load balancing methods in the literature for different parallel architectures. The most recent of these methods typically follow the master-worker paradigm, where a single coordinator (master) is responsible for making all the scheduling decisions based on information provided by the workers. Depending on the application requirements, the scheduling policy and the computational environment, the benefits of this paradigm may be limited as follows: (1) its efficiency may not scale as the number of processors increases, and (2) it is quite probable that the scheduling decisions are made based on outdated information, especially on systems where the workload changes rapidly. In an effort to address these limitations, we propose a distributed (master-less) load balancing scheme, in which the scheduling decisions are made by the workers in a distributed fashion. We implemented this method along with other two master-worker schemes (a previously existing one and a recently modified one) for three different scientific computational kernels. In order to validate the usefulness and efficiency of the proposed scheme, we conducted a series of comparative performance tests with the two master-worker schemes for each computational kernel. The target system is an SMP cluster, on which we simulated three different patterns of system load fluctuation. The experiments strongly support the belief that the distributed approach offers greater performance and better scalability on such systems, showing an overall improvement ranging from 13% to 24% over the master-worker approaches.
机译:高性能计算系统上科学和工程应用程序性能下降的最重要原因之一是计算工作在系统资源上的分配不均。在不规则应用程序和异构分布式系统的情况下,这种现象(称为负载不平衡)更加明显。这激发了并行和分布式计算研究社区的注意力,即致力于为在(异构)分布式系统上运行的科学和工程应用程序提供良好的负载平衡的方法。高效的负载平衡和调度方法用于各种领域的科学应用,例如力学,材料,物理,化学,生物学,应用数学等。此类应用通常采用大量计算方法来模拟复杂现象。非常大规模的时间和规模。这些模拟由对非常大的数据集执行重复计算(以DO / FOR循环的形式)的例程组成,如果不能正确实现和执行这些例程,则可能会导致性能下降。模拟代码中的重复计算次数并不总是恒定的。而且,这些模拟的计算性质实际上可能是不规则的,导致一种计算比其他计算花费(不可预测的)更多时间的情况。为了获得成功和及时的结果,大规模仿真需要使用通常分布广泛且高度异构的大规模计算系统。而且,大型计算系统通常在多个用户之间共享,这导致可用资源的质量和数量高度不可预测。对于不同的并行体系结构,文献中有许多负载平衡方法。这些方法中的最新方法通常遵循主-工人范式,在该范式中,单个协调员(主)负责根据工人提供的信息来制定所有调度决策。取决于应用程序要求,调度策略和计算环境,该范式的好处可能受到如下限制:(1)随着处理器数量的增加,其效率可能无法扩展,并且(2)调度决策是基于过时的信息做出的,尤其是在工作负载快速变化的系统上。为了解决这些限制,我们提出了一种分布式(无主机)负载平衡方案,其中,调度决策是由工作人员以分布式方式制定的。我们针对三种不同的科学计算内核,将该方法与其他两种主要工作方案(先前存在的方案和最近修改的方案)一起实施。为了验证所提出方案的有效性和效率,我们针对每个计算内核使用了两个主工人方案进行了一系列比较性能测试。目标系统是一个SMP集群,我们在其上模拟了三种不同的系统负载波动模式。实验强烈支持这样一种信念,即分布式方法在此类系统上提供了更高的性能和更好的可伸缩性,与总体上相比,总体上提高了13%到24%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号