首页> 外文会议> >High Performance MPI on IBM 12x InfiniBand Architecture
【24h】

High Performance MPI on IBM 12x InfiniBand Architecture

机译:IBM 12x InfiniBand体系结构上的高性能MPI

获取原文

摘要

InfiniBand is becoming increasingly popular in the area of cluster computing due to its open standard and high performance. I/O interfaces like PCI-express and GX+ are being introduced as next generation technologies to drive InfiniBand with very high throughput. HCAs with throughput of 8x on PCI-express have become available. Recently, support for HCAs with 12x throughput on GX+ has been announced. In this paper, we design a message passing interface (MPI) on IBM 12x dual-port HCAs, which consist of multiple send/recv engines per port. We propose and study the impact of various communication scheduling policies (binding, striping and round robin). Based on this study, we present a new policy, EPC (enhanced point-to-point and collective), which incorporates different kinds of communication patterns; point-to-point (blocking, non-blocking) and collective communication, for data transfer. We implement our design and evaluate it with micro-benchmarks, collective communication and NAS parallel benchmarks. Using EPC on a 12x InfiniBand cluster with one HCA and one port, we can improve the performance by 41% with pingpong latency test and 63-65% with the unidirectional and bi-directional bandwidth tests, compared with the default single-rail MPI implementation. Our evaluation on NAS parallel benchmarks shows an improvement of 7-13% in execution time for integer sort and Fourier transform.
机译:由于其开放标准和高性能,InfiniBand在群集计算领域正变得越来越流行。 I / O接口(如PCI-express和GX +)作为下一代技术被引入,以极高的吞吐量驱动InfiniBand。在PCI-express上具有8倍吞吐量的HCA已面世。最近,已经宣布支持GX +上具有12倍吞吐量的HCA。在本文中,我们在IBM 12x双端口HCA上设计了一个消息传递接口(MPI),该接口由每个端口多个发送/接收引擎组成。我们提出并研究了各种通信调度策略(绑定,条带化和循环调度)的影响。在这项研究的基础上,我们提出了一种新的策略,EPC(增强的点对点和集体),它融合了各种通信模式。点对点(阻塞,非阻塞)和集体通信,用于数据传输。我们实施我们的设计,并通过微基准测试,集体通信和NAS并行基准进行评估。与默认的单轨MPI实施相比,在具有一个HCA和一个端口的12x InfiniBand群集上使用EPC,我们可以通过乒乓延迟测试将性能提高41%,通过单向和双向带宽测试将性能提高63-65%。 。我们对NAS并行基准测试的评估表明,整数排序和傅里叶变换的执行时间缩短了7-13%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号