Parallelizing heavyweight debugging tools with mpiecho

Barry Rountree; Todd Gamblin; Bronis R. de Supinsk; Martin Schulz; David K. Lowenthal; Guy Cobb; Henry Tufo

首页> 外文期刊>Parallel Computing >Parallelizing heavyweight debugging tools with mpiecho

【24h】

Parallelizing heavyweight debugging tools with mpiecho

机译：使用mpiecho并行化重量级调试工具

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Idioms created for debugging execution on single processors and multicore systems have been successfully scaled to thousands of processors, but there is little hope that this class of techniques can continue to be scaled out to tens of millions of cores. In order to allow development of more scalable debugging idioms we introduce mpiecho, a novel runtime platform that enables cloning of MPI ranks. Given identical execution on each clone, we then show how heavyweight debugging approaches can be parallelized, reducing their overhead to a fraction of the serialized case. We also show how this platform can be useful in isolating the source of hardware-based nondeterministic behavior and provide a case study based on a recent processor bug at LLNL. While total overhead will depend on the individual tool, we show that the platform itself contributes little: 512x tool parallelization incurs at worst 2x overhead across the NAS Parallel benchmarks, hardware fault isolation contributes at worst an additional 44% overhead. Finally, we show how mpiecho can lead to near-linear reduction in overhead when combined with maid, a heavyweight memory tracking tool provided with Intel's pin platform. We demonstrate overhead reduction from 1466% to 53% and from 740% to 14% for eg (class D, 64 processes) and lu (class D, 64 processes), respectively, using only an additional 64 cores.

机译：为调试单处理器和多核系统上的执行而创建的惯用法已成功扩展到数千个处理器，但是将此类技术继续扩展到数千万个内核的希望很小。为了允许开发更多可扩展的调试习惯用法，我们引入了mpiecho，这是一种新颖的运行时平台，可以克隆MPI等级。在每个克隆上执行相同的操作后，我们将展示如何使重量级调试方法并行化，并将其开销减少到序列化情况的一小部分。我们还将展示该平台如何在隔离基于硬件的不确定行为的来源方面提供帮助，并基于LLNL最近的处理器错误提供案例研究。虽然总开销将取决于单个工具，但我们证明平台本身所贡献的很少：在NAS并行基准测试中，512倍的工具并行化在最坏的2倍开销下发生，而硬件故障隔离在最坏的情况下又产生44％的开销。最后，我们展示了mpiecho与Maid（英特尔针平台提供的重量级内存跟踪工具）结合使用时如何能够导致开销的近乎线性减少。对于仅使用附加的64个内核，例如，（D类，64个进程）和lu（D类，64个进程），我们证明开销分别从1466％减少到53％，从740％减少到14％。

著录项

来源
《Parallel Computing》 |2013年第3期|156-166|共11页
作者
Barry Rountree; Todd Gamblin; Bronis R. de Supinsk; Martin Schulz; David K. Lowenthal; Guy Cobb; Henry Tufo;
展开▼
作者单位

Lawrence Livermore National Laboratory, 7000 East Ave., Livermore, CA 94550, United States;

Lawrence Livermore National Laboratory, 7000 East Ave., Livermore, CA 94550, United States;

Lawrence Livermore National Laboratory, 7000 East Ave., Livermore, CA 94550, United States;

Lawrence Livermore National Laboratory, 7000 East Ave., Livermore, CA 94550, United States;

Department of Computer Science, University of Arizona, Tucson, AZ 85721, United States;

Google, Inc.;

Department of Computer Science, University of Colorado, Boulder, CO 80309, United States;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
MPI; dynamic binary instrumentation; heavyweight tools;

机译：MPI;动态二进制仪器;重量级工具;

相似文献

外文文献
中文文献
专利

1. Extending the Eclipse Parallel Tools Platform Debugger with Scalable Parallel Debugging Library [J] . Chao Jin, Liang Ding, David Abramson Procedia Computer Science . 2013,第1期

机译：使用可扩展并行调试库扩展Eclipse并行工具平台调试器
2. GPU parallel computing： Programming language,debugging tools and data structures [J] . Kun ZHOU 中国电气与电子工程前沿：英文版 . 2012,第001期

机译：GPU并行计算：编程语言，调试工具和数据结构
3. An experiment in tool integration: The DDBG parallel and distributed debugger [J] . Jose C. Cunha, Joao Lourenco, Tiago R. Antao Journal of systems architecture . 1999,第11期

机译：工具集成实验：DDBG并行和分布式调试器
4. Extending the Eclipse Parallel Tools Platform debugger with Scalable Parallel Debugging Library [C] . Chao Jin, Liang Ding, David Abramson International Conference on Computational Science . 2013

机译：扩展Eclipse并行工具平台调试器，可扩展并行调试库
5. Debugging, Repair, and Synthesis of Task-Parallel Programs [D] . Surendran, Rishi. 2017

机译：任务并行程序的调试，修复和综合
6. Hypercluster: a flexible tool for parallelized unsupervised clustering optimization [O] . Lili Blumenberg, Kelly V. Ruggles 2020

机译：HyperCluster：一种灵活的并行化无监督聚类优化工具
7. Extending the Eclipse Parallel Tools Platform Debugger with Scalable Parallel Debugging Library [O] . Jin Chao, Ding Liang, Abramson David 2013

机译：使用可扩展并行调试库扩展Eclipse并行工具平台调试器

Parallelizing heavyweight debugging tools with mpiecho

摘要

著录项

相似文献

相关主题

期刊订阅