【24h】

Fast and Scalable Startup of MPI Programs in InfiniBand Clusters

机译:InfiniBand群集中MPI程序的快速可扩展启动

获取原文
获取原文并翻译 | 示例

摘要

One of the major challenges in parallel computing over large scale clusters is fast and scalable process startup, which typically can be divided into two phases: process initiation and connection setup. In this paper, we characterize the startup of MPI programs in InfiniBand clusters and identify two startup scalability issues: serialized process initiation in the initiation phase and high communication overhead in the connection setup phase. To reduce the connection setup time, we have developed one approach with data reassembly to reduce data volume, and another with a bootstrap channel to parallelize the communication. Furthermore, a process management framework, Multi-Purpose Daemons (MPD) system is exploited to speed up process initiation. Our experimental results show that job startup time has been improved by more than 4 times for 128-process jobs, and the improvement can be more than two orders of magnitude for 2048-process jobs as suggested by our analytical models.
机译:大规模集群上并行计算的主要挑战之一是快速且可扩展的流程启动,通常可以分为两个阶段:流程启动和连接建立。在本文中,我们对InfiniBand群集中MPI程序的启动进行了表征,并确定了两个启动可扩展性问题:启动阶段的序列化进程启动和连接建立阶段的高通信开销。为了减少连接建立时间,我们开发了一种方法,该方法具有数据重组以减少数据量,而另一种方法具有引导通道以并行化通信。此外,还利用了流程管理框架,多用途守护程序(MPD)系统来加快流程启动。我们的实验结果表明,对于128个进程的作业,启动时间缩短了4倍以上,而根据我们的分析模型的建议,对于2048个进程的作业,启动时间可缩短了两个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号