首页> 外文会议>International Conference for High Performance Computing, Networking, Storage and Analysis >Distributed wait state tracking for runtime MPI deadlock detection
【24h】

Distributed wait state tracking for runtime MPI deadlock detection

机译:运行时MPI死锁检测的分布式等待状态跟踪

获取原文

摘要

The widely used Message Passing Interface (MPI) with its multitude of communication functions is prone to usage errors. Runtime error detection tools aid in the removal of these errors. We develop MUST as one such tool that provides a wide variety of automatic correctness checks. Its correctness checks can be run in a distributed mode, except for its deadlock detection. This limitation applies to a wide range of tools that either use centralized detection algorithms or a timeout approach. In order to provide scalable and distributed deadlock detection with detailed insight into deadlock situations, we propose a model for MPI blocking conditions that we use to formulate a distributed algorithm. This algorithm implements scalable MPI deadlock detection in MUST. Stress tests at up to 4,096 processes demonstrate the scalability of our approach. Finally, overhead results for a complex benchmark suite demonstrate an average runtime increase of 34% at 2,048 processes.
机译:具有众多通信功能的广泛使用的消息传递接口(MPI)容易使用错误。运行时错误检测工具有助于删除这些错误。我们的发展必须是一个这样的工具,提供各种自动正确性检查。除了其死锁检测之外,其正确性检查可以以分布式模式运行。此限制适用于使用集中检测算法或超时方法的各种工具。为了提供可扩展和分布式的死锁检测,并详细介绍死锁情况,我们提出了一种用于推进分布式算法的MPI阻塞条件的模型。该算法必须实现可扩展的MPI死锁检测。高达4,096个流程的压力测试展示了我们方法的可扩展性。最后,复杂基准套件的开销结果表明平均运行时在2,048个过程中增加了34%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号