首页> 外文会议>International Conference for High Performance Computing, Networking, Storage and Analysis >Distributed wait state tracking for runtime MPI deadlock detection
【24h】

Distributed wait state tracking for runtime MPI deadlock detection

机译:用于运行时MPI死锁检测的分布式等待状态跟踪

获取原文

摘要

The widely used Message Passing Interface (MPI) with its multitude of communication functions is prone to usage errors. Runtime error detection tools aid in the removal of these errors. We develop MUST as one such tool that provides a wide variety of automatic correctness checks. Its correctness checks can be run in a distributed mode, except for its deadlock detection. This limitation applies to a wide range of tools that either use centralized detection algorithms or a timeout approach. In order to provide scalable and distributed deadlock detection with detailed insight into deadlock situations, we propose a model for MPI blocking conditions that we use to formulate a distributed algorithm. This algorithm implements scalable MPI deadlock detection in MUST. Stress tests at up to 4,096 processes demonstrate the scalability of our approach. Finally, overhead results for a complex benchmark suite demonstrate an average runtime increase of 34% at 2,048 processes.
机译:具有多种通信功能的广泛使用的消息传递接口(MPI)容易出现使用错误。运行时错误检测工具有助于消除这些错误。我们必须将MUST开发为一种工具,它可以提供多种自动正确性检查。除死锁检测外,它的正确性检查可以在分布式模式下运行。此限制适用于使用集中式检测算法或超时方法的多种工具。为了提供对死锁情况的详细了解的可伸缩和分布式死锁检测,我们提出了用于MPI阻塞条件的模型,该模型用于制定分布式算法。该算法必须在MUST中实现可伸缩的MPI死锁检测。在多达4,096个过程中进行的压力测试证明了我们方法的可扩展性。最后,复杂基准套件的开销结果表明,在2,048个进程中,平均运行时间增长了34%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号