首页> 外文会议>International Conference for High Performance Computing, Networking, Storage and Analysis >Enabling Efficient Multithreaded MPI Communication through a Library-Based Implementation of MPI Endpoints
【24h】

Enabling Efficient Multithreaded MPI Communication through a Library-Based Implementation of MPI Endpoints

机译:通过基于库的MPI端点实现实现高效的多线程MPI通信

获取原文

摘要

Modern high-speed interconnection networks are designed with capabilities to support communication from multiple processor cores. The MPI endpoints extension has been proposed to ease process and thread count tradeoffs by enabling multithreaded MPI applications to efficiently drive independent network communication. In this work, we present the first implementation of the MPI endpoints interface and demonstrate the first applications running on this new interface. We use a novel library-based design that can be layered on top of any existing, production MPI implementation. Our approach uses proxy processes to isolate threads in an MPI job, eliminating threading overheads within the MPI library and allowing threads to achieve process-like communication performance. We evaluate the performance advantages of our implementation through several benchmarks and kernels. Performance results for the Lattice QCD Dslash kernel indicate that endpoints provides up to 2.9× improvement in communication performance and 1.87× overall performance improvement over a highly optimized hybrid MPI+OpenMP baseline on 128 processors.
机译:现代高速互连网络具有支持来自多个处理器内核的通信的功能。已经提出了MPI端点扩展,以通过使多线程MPI应用程序有效地驱动独立的网络通信来减轻进程和线程数的折衷。在这项工作中,我们展示了MPI端点接口的第一个实现,并演示了在此新接口上运行的第一个应用程序。我们使用一种新颖的基于库的设计,该设计可以分层放置在任何现有的生产MPI实施之上。我们的方法使用代理进程来隔离MPI作业中的线程,从而消除了MPI库中的线程开销,并允许线程实现类似于进程的通信性能。我们通过几个基准和内核评估了实现的性能优势。 Lattice QCD Dslash内核的性能结果表明,与128个处理器上的高度优化的混合MPI + OpenMP基准相比,端点可将通信性能提高2.9倍,将整体性能提高1.87倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号