首页> 外文期刊>Journal of Parallel and Distributed Computing >FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architectures
【24h】

FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architectures

机译:Falcon-x:现代CPU和GPU架构上的零复制MPI派生数据类型处理

获取原文
获取原文并翻译 | 示例

摘要

This paper addresses the challenges of MPI derived datatype processing and proposes FALCON-X -A Fast and Low-overhead Communication framework for optimized zero-copy intra-node derived datatype communication on emerging CPU/GPU architectures. We quantify various performance bottlenecks such as memory layout translation and copy overheads for highly fragmented MPI datatypes and propose novel pipelining and memoization-based designs to achieve efficient derived datatype communication. In addition, we also propose enhancements to the MPI standard to address the semantic limitations. The experimental evaluations show that our proposed designs significantly improve the intra-node communication latency and bandwidth over state-of-the-art MPI libraries on modern CPU and GPU systems. By using representative application kernels such as MILC, WRF, NAS_MG, Specfem3D, and Stencils on three different CPU architectures and two different GPU systems including DGX-2, we demonstrate up to 5.5x improvement on multi-core CPUs and 120x benefits on DXG-2 GPU system over state-of-the-art designs in other MPI libraries.
机译:本文解决了MPI派生数据类型处理的挑战,并提出了Falcon-X-8的快速和低开销通信框架,用于新出现的CPU / GPU架构上的优化零拷贝内部节点派生数据类型通信。我们量化各种性能瓶颈,如内存布局转换,并复制高度碎片化的MPI数据类型的开销,并提出了新的流水线和基于备忘的设计,以实现有效的派生数据类型通信。此外,我们还提出了对MPI标准的增强,以解决语义限制。实验评估表明,我们的建议设计在现代CPU和GPU系统上显着提高了节点内的通信延迟和带宽。通过使用代表应用程序内核,如静态CPU架构上的MILC,WRF,NAS_MG,SPECFEM3D和模板和模板,以及包括DGX-2的两个不同GPU系统,我们展示了对DXG的多核CPU和120倍优势的5.5倍的提高 - 2 GPU系统在其他MPI库中的最先进设计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号