Advanced Thread Synchronization for Multithreaded MPI Implementations

机译：多线程MPI实现的高级线程同步

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Concurrent multithreaded access to the Message Passing Interface (MPI) is gaining importance to support emerging hybrid MPI applications. The interoperability between threads and MPI, however, is complex and renders efficient implementations nontrivial. Prior studies showed that threads waiting for communication progress (waiting threads) often interfere with others (active threads) and degrade their progress. This situation occurs when both classes of threads compete for the same MPI resource and ownership passing to waiting threads does not guarantee communication to advance. The best-known practical solution prioritizes active threads and adapts first-in-first-out arbitration within each class. This approach, however, suffers from residual wasted resource acquisitions (waste) and ignores data locality, thus resulting in poor scalability. In this work, we propose thread synchronization improvements to eliminate waste while preserving data locality in a production MPI implementation. First, we leverage MPI knowledge and a fast synchronization method to eliminate waste and accelerate progress. Second, we rely on a cooperative progress model that dynamically elects and restricts a single waiting thread to drive a communication context for improved data locality. Third, we prioritize active threads and synchronize them with a locality-preserving lock that is hierarchical and exploits unbounded bias for high throughput. Results show significant improvement in synthetic microbenchmarks and two MPI+OpenMP applications.

机译：对消息传递接口（MPI）的并发多线程访问对于支持新兴的混合MPI应用程序越来越重要。但是，线程和MPI之间的互操作性很复杂，并且使有效的实现变得不平凡。先前的研究表明，等待通信进度的线程（等待线程）通常会干扰其他线程（活动线程）并降低其进度。当两种类型的线程都争夺相同的MPI资源并且所有权传递给等待的线程不能保证通信继续进行时，会发生这种情况。最著名的实用解决方案对活动线程进行优先级排序，并在每个类中采用先进先出的仲裁方法。但是，这种方法会遭受剩余的资源浪费（浪费），并且会忽略数据局部性，从而导致可伸缩性差。在这项工作中，我们提出了线程同步改进，以消除浪费，同时在生产MPI实现中保留数据局部性。首先，我们利用MPI知识和快速同步方法来消除浪费并加快进度。其次，我们依赖于协作进度模型，该模型动态选择和限制单个等待线程来驱动通信上下文，以改善数据局部性。第三，我们对活动线程进行优先级排序，并将它们与保留位置的锁进行同步，该锁是分层的，并利用无限制的偏见来实现高吞吐量。结果显示，合成微基准测试和两个MPI + OpenMP应用程序均得到了显着改善。

著录项

来源
《IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing》|2017年|314-324|共11页
会议地点
作者
Hoang-Vu Dang; Sangmin Seo; Abdelhalim Amer; Pavan Balaji;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Message systems; Yarn; Synchronization; Benchmark testing; Safety; Context; Production;

机译：消息系统;纱线;同步;基准测试;安全性;上下文;生产;

相似文献

外文文献
中文文献
专利

1. Enabling efficient multithreaded MPI communication through a library-based implementation of MPI endpoints [J] . Khaled Hamidouche Computing reviews . 2015,第6期

机译：通过基于库的MPI端点实现启用高效的多线程MPI通信
2. FINE-GRAINED MULTITHREADING SUPPORT FOR HYBRID THREADED MPI PROGRAMMING [J] . Pavan Balaji, Darius Buntinas, David Goodell, International Journal of High Performance Computing Applications . 2010,第1期

机译：混合线程MPI编程的精细多线程支持
3. TeamWork: Synchronizing Threads Globally to Detect Real Deadlocks for Multithreaded Programs [J] . Yan Cai, Ke Zhai, Shangru Wu, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2013,第8期

机译：TeamWork：全局同步线程以检测多线程程序的实际死锁
4. Advanced Thread Synchronization for Multithreaded MPI Implementations [C] . Hoang-Vu Dang, Sangmin Seo, Abdelhalim Amer, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing . 2017

机译：多线程MPI实现的高级线程同步
5. Adaptive dynamic thread scheduling for simultaneous multithreaded architectures with a detector thread. [D] . Shin, Chulho. 2002

机译：具有检测器线程的同时多线程体系结构的自适应动态线程调度。
6. Improving OPC UA Publish-Subscribe Mechanism over UDP with Synchronization Algorithm and Multithreading Broker Application [O] . Alexandru Ioana, Adrian Korodi 2020

机译：通过同步算法和多线程代理应用程序改进UDP的OPC UA发布 - 订阅机制
7. Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming [O] . Pavan Balaji, Darius Buntinas, David Goodell, 2016

机译：细粒度多线程支持混合线程mpI编程

Advanced Thread Synchronization for Multithreaded MPI Implementations

摘要

著录项

相似文献

相关主题

期刊订阅