首页> 外文期刊>Parallel Computing >Parallelizing heavyweight debugging tools with mpiecho
【24h】

Parallelizing heavyweight debugging tools with mpiecho

机译:使用mpiecho并行化重量级调试工具

获取原文
获取原文并翻译 | 示例
           

摘要

Idioms created for debugging execution on single processors and multicore systems have been successfully scaled to thousands of processors, but there is little hope that this class of techniques can continue to be scaled out to tens of millions of cores. In order to allow development of more scalable debugging idioms we introduce mpiecho, a novel runtime platform that enables cloning of MPI ranks. Given identical execution on each clone, we then show how heavyweight debugging approaches can be parallelized, reducing their overhead to a fraction of the serialized case. We also show how this platform can be useful in isolating the source of hardware-based nondeterministic behavior and provide a case study based on a recent processor bug at LLNL. While total overhead will depend on the individual tool, we show that the platform itself contributes little: 512x tool parallelization incurs at worst 2x overhead across the NAS Parallel benchmarks, hardware fault isolation contributes at worst an additional 44% overhead. Finally, we show how mpiecho can lead to near-linear reduction in overhead when combined with maid, a heavyweight memory tracking tool provided with Intel's pin platform. We demonstrate overhead reduction from 1466% to 53% and from 740% to 14% for eg (class D, 64 processes) and lu (class D, 64 processes), respectively, using only an additional 64 cores.
机译:为调试单处理器和多核系统上的执行而创建的惯用法已成功扩展到数千个处理器,但是将此类技术继续扩展到数千万个内核的希望很小。为了允许开发更多可扩展的调试习惯用法,我们引入了mpiecho,这是一种新颖的运行时平台,可以克隆MPI等级。在每个克隆上执行相同的操作后,我们将展示如何使重量级调试方法并行化,并将其开销减少到序列化情况的一小部分。我们还将展示该平台如何在隔离基于硬件的不确定行为的来源方面提供帮助,并基于LLNL最近的处理器错误提供案例研究。虽然总开销将取决于单个工具,但我们证明平台本身所贡献的很少:在NAS并行基准测试中,512倍的工具并行化在最坏的2倍开销下发生,而硬件故障隔离在最坏的情况下又产生44%的开销。最后,我们展示了mpiecho与Maid(英特尔针平台提供的重量级内存跟踪工具)结合使用时如何能够导致开销的近乎线性减少。对于仅使用附加的64个内核,例如,(D类,64个进程)和lu(D类,64个进程),我们证明开销分别从1466%减少到53%,从740%减少到14%。

著录项

  • 来源
    《Parallel Computing》 |2013年第3期|156-166|共11页
  • 作者单位

    Lawrence Livermore National Laboratory, 7000 East Ave., Livermore, CA 94550, United States;

    Lawrence Livermore National Laboratory, 7000 East Ave., Livermore, CA 94550, United States;

    Lawrence Livermore National Laboratory, 7000 East Ave., Livermore, CA 94550, United States;

    Lawrence Livermore National Laboratory, 7000 East Ave., Livermore, CA 94550, United States;

    Department of Computer Science, University of Arizona, Tucson, AZ 85721, United States;

    Google, Inc.;

    Department of Computer Science, University of Colorado, Boulder, CO 80309, United States;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    MPI; dynamic binary instrumentation; heavyweight tools;

    机译:MPI;动态二进制仪器;重量级工具;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号