...
首页> 外文期刊>ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages >Accurate Application Progress Analysis for Large-Scale Parallel Debugging
【24h】

Accurate Application Progress Analysis for Large-Scale Parallel Debugging

机译:大规模并行调试的准确应用进度分析

获取原文
获取原文并翻译 | 示例
           

摘要

Debugging large-scale parallel applications is challenging. In most HPC applications, parallel tasks progress in a coordinated fashion, and thus a fault in one task can quickly propagate to other tasks, making it difficult to debug. Finding the least-progressed tasks can significantly reduce the effort to identify the task where the fault originated. However, existing approaches for detecting them suffer low accuracy and large overheads; either they use imprecise static analysis or are unable to infer progress dependence inside loops. We present a loop-aware progress-dependence analysis tool, PRODOMETER, which determines relative progress among parallel tasks via dynamic analysis. Our fault-injection experiments suggest that its accuracy and precision are over 90% for most cases and that it scales well up to 16,384 MPI tasks. Further, our case study shows that it significantly helped diagnosing a perplexing error in MPI, which only manifested at large scale.
机译:调试大型并行应用程序具有挑战性。在大多数HPC应用程序中,并行任务以协调的方式进行,因此一个任务中的故障会迅速传播到其他任务,从而难以调试。查找进度最慢的任务可以显着减少确定故障来源的任务的工作量。但是,现有的检测它们的方法精度低,开销大。他们要么使用不精确的静态分析,要么无法推断循环内部的进度依赖性。我们提供了一个循环感知的进度依赖分析工具PRODOMETER,它通过动态分析确定并行任务之间的相对进度。我们的故障注入实验表明,在大多数情况下,其准确性和精度都超过90%,并且可以扩展到多达16,384个MPI任务。此外,我们的案例研究表明,它极大地帮助诊断了仅大规模出现的MPI中一个令人困惑的错误。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号