首页> 外文会议>IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing >Advanced Thread Synchronization for Multithreaded MPI Implementations
【24h】

Advanced Thread Synchronization for Multithreaded MPI Implementations

机译:多线程MPI实现的高级线程同步

获取原文

摘要

Concurrent multithreaded access to the Message Passing Interface (MPI) is gaining importance to support emerging hybrid MPI applications. The interoperability between threads and MPI, however, is complex and renders efficient implementations nontrivial. Prior studies showed that threads waiting for communication progress (waiting threads) often interfere with others (active threads) and degrade their progress. This situation occurs when both classes of threads compete for the same MPI resource and ownership passing to waiting threads does not guarantee communication to advance. The best-known practical solution prioritizes active threads and adapts first-in-first-out arbitration within each class. This approach, however, suffers from residual wasted resource acquisitions (waste) and ignores data locality, thus resulting in poor scalability. In this work, we propose thread synchronization improvements to eliminate waste while preserving data locality in a production MPI implementation. First, we leverage MPI knowledge and a fast synchronization method to eliminate waste and accelerate progress. Second, we rely on a cooperative progress model that dynamically elects and restricts a single waiting thread to drive a communication context for improved data locality. Third, we prioritize active threads and synchronize them with a locality-preserving lock that is hierarchical and exploits unbounded bias for high throughput. Results show significant improvement in synthetic microbenchmarks and two MPI+OpenMP applications.
机译:对消息传递接口(MPI)的并发多线程访问对于支持新兴的混合MPI应用程序越来越重要。但是,线程和MPI之间的互操作性很复杂,并且使有效的实现变得不平凡。先前的研究表明,等待通信进度的线程(等待线程)通常会干扰其他线程(活动线程)并降低其进度。当两种类型的线程都争夺相同的MPI资源并且所有权传递给等待的线程不能保证通信继续进行时,会发生这种情况。最著名的实用解决方案对活动线程进行优先级排序,并在每个类中采用先进先出的仲裁方法。但是,这种方法会遭受剩余的资源浪费(浪费),并且会忽略数据局部性,从而导致可伸缩性差。在这项工作中,我们提出了线程同步改进,以消除浪费,同时在生产MPI实现中保留数据局部性。首先,我们利用MPI知识和快速同步方法来消除浪费并加快进度。其次,我们依赖于协作进度模型,该模型动态选择和限制单个等待线程来驱动通信上下文,以改善数据局部性。第三,我们对活动线程进行优先级排序,并将它们与保留位置的锁进行同步,该锁是分层的,并利用无限制的偏见来实现高吞吐量。结果显示,合成微基准测试和两个MPI + OpenMP应用程序均得到了显着改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号