首页> 外文会议>International Workshop on Extreme-Scale Programming Tools >PARLoT: Efficient Whole-Program Call Tracing for HPC Applications
【24h】

PARLoT: Efficient Whole-Program Call Tracing for HPC Applications

机译:Parlot:用于HPC应用的高效全程呼叫跟踪

获取原文

摘要

The complexity of HPC software and hardware is quickly increasing. As a consequence, the need for efficient execution tracing to gain insight into HPC application behavior is steadily growing. Unfortunately, available tools either do not produce traces with enough detail or incur large overheads. An efficient tracing method that overcomes the tradeoff between maximum information and minimum overhead is therefore urgently needed. This paper presents such a method and tool, called ParLoT, with the following key features. (1) It describes a technique that makes low-overhead on-the-fly compression of whole-program call traces feasible. (2) It presents a new, efficient, incremental trace-compression approach that reduces the trace volume dynamically, which lowers not only the needed bandwidth but also the tracing overhead. (3) It collects all caller/callee relations, call frequencies, call stacks, as well as the full trace of all calls and returns executed by each thread, including in library code. (4) It works on top of existing dynamic binary instrumentation tools, thus requiring neither source-code modifications nor recompilation. (5) It supports program analysis and debugging at the thread, thread-group, and program level. This paper establishes that comparable capabilities are currently unavailable. Our experiments with the NAS parallel benchmarks running on the Comet supercomputer with up to 1,024 cores show that ParLoT can collect whole-program function-call traces at an average tracing bandwidth of just 56 kB/s per core.
机译:HPC软件和硬件的复杂性很快增加。因此,需要有效执行跟踪以获得洞察HPC应用行为的洞察力正在稳步增长。不幸的是,可用的工具要么不会产生足够的细节或招致大开销的痕迹。因此,迫切需要一种克服最大信息与最小开销之间的权衡的有效跟踪方法。本文介绍了称为Parlot的方法和工具,具有以下关键功能。 (1)它描述了一种使整个程序呼叫痕迹的低开销的技术可行。 (2)它呈现了一种新的,有效,增量的追踪方法,可动态减少跟踪体积,这不仅降低了所需的带宽,而且降低了跟踪开销。 (3)它收集所有来电/分配关系,呼叫频率,调用堆栈以及每个线程执行的所有调用的完整跟踪,包括库代码。 (4)它适用于现有动态二进制仪器工具的顶部,因此要求源代码修改并不重新编译。 (5)它支持在线程,线程组和程序级别的程序分析和调试。本文建立了可比的功能目前无法使用。我们的实验与彗星超级计算机上运行的NAS并联基准,具有高达1,024个核心,显示Parlot可以在每个核心的平均跟踪带宽处收集全程功能呼叫迹线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号