首页> 外文期刊>Parallel Computing >Exploiting heterogeneity of communication channels for efficient GPU selection on multi-GPU nodes
【24h】

Exploiting heterogeneity of communication channels for efficient GPU selection on multi-GPU nodes

机译:利用通信通道的异构性在多GPU节点上进行有效的GPU选择

获取原文
获取原文并翻译 | 示例

摘要

Multi-GPU nodes have become the platform of choice for scientific applications. In a multi-GPU node, GPUs are interconnected together via different communication channels. The intranode communications among GPUs may traverse different paths with different latency and bandwidth characteristics. As the number of GPUs within a multi-GPU node increases, the physical topology of the GPU interconnects tend to have more levels of hierarchy, which in turn increases the heterogeneity of the GPU communication channels. In this paper, we show that the performance of different intranode GPU communication channels can be considerably different from each other. Accordingly, we propose a topology-aware GPU selection scheme for efficient assignment of GPUs to the MPI processes within a node. The resulting assignment helps to improve the communication performance by mapping more intensive inter-process GPU-to-GPU communications on the stronger communication channels. We leverage three metrics in our scheme to distinguish among different GPU-to-GPU communication channels: latency, bandwidth, and distance. We evaluate our scheme through extensive experiments conducted on a 16-GPU node, and show that our scheme can provide considerable performance improvements over the default GPU selection scheme. In particular, we can achieve up to 70% and 21% performance improvement at the microbenchmark and application level, respectively. (C) 2017 Elsevier B.V. All rights reserved.
机译:多GPU节点已成为科学应用的首选平台。在多GPU节点中,GPU通过不同的通信通道互连在一起。 GPU之间的节点内通信可以遍历具有不同等待时间和带宽特性的不同路径。随着多GPU节点中GPU的数量增加,GPU互连的物理拓扑趋向于具有更高的层次结构级别,这反过来又增加了GPU通信通道的异构性。在本文中,我们证明了不同的节点内GPU通信通道的性能可能存在很大差异。因此,我们提出了一种拓扑感知的GPU选择方案,用于将GPU有效分配给节点内的MPI进程。通过将更密集的进程间GPU到GPU之间的通信映射到更强大的通信通道上,所产生的任务有助于提高通信性能。我们利用方案中的三个指标来区分不同的GPU到GPU的通信通道:延迟,带宽和距离。我们通过在16 GPU节点上进行的广泛实验评估了我们的方案,并表明我们的方案与默认的GPU选择方案相比可以提供可观的性能改进。特别是,我们可以在微基准测试和应用程序级别分别实现多达70%和21%的性能提升。 (C)2017 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号