首页> 外文期刊>Cluster computing >Improved MPI collectives for MPI processes in shared address spaces
【24h】

Improved MPI collectives for MPI processes in shared address spaces

机译:改进的MPI集合,用于共享地址空间中的MPI流程

获取原文
获取原文并翻译 | 示例
       

摘要

As the number of cores per node keeps increasing, it becomes increasingly important for MPI to leverage shared memory for intranode communication. This paper investigates the design and optimization of MPI collectives for clusters of NUMA nodes. We develop performance models for collective communication using shared memory and we demonstrate several algorithms for various collectives. Experiments are conducted on both Xeon X5650 and Opteron 6100 InfiniBand clusters. The measurements agree with the model and indicate that different algorithms dominate for short vectors and long vectors. We compare our shared-memory allreduce with several MPI implementations-Open MPI, MPICH2, and MVAPICH2-that utilize system shared memory to facilitate interprocess communication. On a 16-node Xeon cluster and 8-node Opteron cluster, our implementation achieves on geometric average 2.3X and 2.1X speedup over the best MPI implementation, respectively. Our techniques enable an efficient implementation of collective operations on future multi- and manycore systems.
机译:随着每个节点的核心数量不断增加,MPI利用共享内存进行节点内通信变得越来越重要。本文研究了NUMA节点群集的MPI集合的设计和优化。我们使用共享内存开发了用于集体通信的性能模型,并且为各种集体演示了几种算法。实验在Xeon X5650和Opteron 6100 InfiniBand群集上进行。测量结果与模型相符,并表明对于短向量和长向量,不同的算法占主导地位。我们将共享内存的减少与几种MPI实现(开放MPI,MPICH2和MVAPICH2-)进行了比较,这些实现利用系统共享内存来促进进程间通信。在16节点的Xeon群集和8节点的Opteron群集上,我们的实现分别比最佳MPI实现的几何平均速度提高了2.3倍和2.1倍。我们的技术可以在未来的多核和多核系统上有效地实施集体行动。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号