首页> 外文会议>IEEE International Symposium on Computer Architecture and High Performance Computing >Towards Communication Profile, Topology and Node Failure Aware Process Placement
【24h】

Towards Communication Profile, Topology and Node Failure Aware Process Placement

机译:面向通信配置文件,拓扑和节点故障感知过程的放置

获取原文

摘要

HPC systems need to keep growing in size to meet the ever-increasing demand for high levels of capability and capacity, often in tight time windows for urgent computation. However, increasing the size, complexity and heterogeneity of HPC systems also increases the risk and impact of system failures, that result in resource waste and aborted jobs. A major contributor to job completion time is the cost of interprocess communication. To address performance and energy efficiency, several prior studies have targeted improvements of communication locality. To meet this goal, they derive a mapping of MPI processes to system nodes in a way that reduces communication cost. However, such approaches disregard the effect of system failures. In this work, we propose a resource allocation approach for MPI jobs, considering both high performance and error resilience. Our approach, named Communication Profile, Topology and node Failure (CPTF), takes into account the application's communication profile, system topology and node failure probability for assigning job processes to nodes. We evaluate variants of CPTF through simulations of two MPI applications, one with a regular communication pattern (LAMMPS) and one with an irregular one (NPB-DT). In both cases, the variant of CPTF that strives to avoid failure-prone nodes and communication paths achieves lower time to complete job batches when compared to the default resource allocation policy of Slurm. It also exhibits the lowest ratio of aborted jobs. The average improvement in batch completion time is 67% for NPB-DT and 34% for LAMMPS.
机译:HPC系统需要保持不断增长的规模,以满足对高水平能力和容量不断增长的需求,通常需要在紧迫的时间范围内进行紧急计算。但是,增加HPC系统的大小,复杂性和异构性也会增加系统故障的风险和影响,从而导致资源浪费和作业中止。作业完成时间的一个主要因素是进程间通信的成本。为了解决性能和能源效率问题,一些现有研究的目标是改善通信位置。为了实现此目标,他们以降低通信成本的方式派生了MPI进程到系统节点的映射。但是,这种方法忽略了系统故障的影响。在这项工作中,我们考虑到高性能和错误恢复能力,提出了一种用于MPI作业的资源分配方法。我们的方法称为通信配置文件,拓扑和节点故障(CPTF),它考虑了应用程序的通信配置文件,系统拓扑结构和为节点分配作业过程的节点故障概率。我们通过仿真两种MPI应用程序来评估CPTF的变体,一种具有常规的通信模式(LAMMPS),另一种具有不规则的通信模式(NPB-DT)。在这两种情况下,与Slurm的默认资源分配策略相比,CPTF的变体都在努力避免容易出现故障的节点和通信路径,从而缩短了完成作业批处理的时间。它也表现出最低的中止工作率。对于NPB-DT,批处理完成时间的平均改善为67%,对于LAMMPS,则为34%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号