首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium Workshops >GPU-aware Communication with UCX in Parallel Programming Models: Charm++, MPI, and Python
【24h】

GPU-aware Communication with UCX in Parallel Programming Models: Charm++, MPI, and Python

机译:GPU感知与UCX的并行编程模型中的通信:Charm ++,MPI和Python

获取原文
获取外文期刊封面目录资料

摘要

As an increasing number of leadership-class systems embrace GPU accelerators in the race towards exascale, efficient communication of GPU data is becoming one of the most critical components of high-performance computing. For developers of parallel programming models, implementing support for GPU-aware communication using native APIs for GPUs such as CUDA can be a daunting task as it requires considerable effort with little guarantee of performance. In this work, we demonstrate the capability of the Unified Communication X (UCX) framework to compose a GPU-aware communication layer that serves multiple parallel programming models of the Charm++ ecosystem: Charm++, Adaptive MPI (AMPI), and Charm4py. We demonstrate the performance impact of our designs with microbenchmarks adapted from the OSU benchmark suite, obtaining improvements in latency of up to 10.2x, 11.7x, and 17.4x in Charm++, AMPI, and Charm4py, respectively. We also observe increases in bandwidth of up to 9.6x in Charm++, 10x in AMPI, and 10.5x in Charm4py. We show the potential impact of our designs on real-world applications by evaluating a proxy application for the Jacobi iterative method, improving the communication performance by up to 12.4x in Charm++, 12.8x in AMPI, and 19.7x in Charm4py.
机译:作为越来越多的领导级系统,拥抱GPU加速器在竞争中,GPU数据的高效沟通正在成为高性能计算的最关键的组件之一。对于并行编程模型的开发人员,实现对使用本机API的GPU感知通信的支持,例如CUDA可能是一个令人生畏的任务,因为它需要很大的努力,只需很少的性能保证。在这项工作中,我们展示了统一通信X(UCX)框架的能力来撰写用于组成的GPU感知通信层,该层用于Charm ++生态系统的多个并行编程模型:Charm ++,Adaptive MPI(AMPI)和Charm4py。我们展示了我们的设计对从OSU基准套件调整的微稳态的性能影响,分别在Charm ++,AMPI和Charm4py中获得高达10.2倍,11.7倍和17.4倍的延迟的改进。我们还观察到Charm ++中高达9.6倍,10x在AMPI中的带宽增加,10.5倍的Charm4py。我们通过评估jacobi迭代方法的代理应用程序,在Charm ++中,在Charm4py中的Charm ++中,12.8x中,在Charm ++中的通信性能提高12.4x,12.7x中,在Charm4py中,通过评估Qacobi迭代方法,将通信性能提高12.4倍,在Charm4py中的12.7倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号