首页> 外文期刊>Computing in science & engineering >Systematic Debugging Methods for Large-Scale HPC Computational Frameworks
【24h】

Systematic Debugging Methods for Large-Scale HPC Computational Frameworks

机译:大型HPC计算框架的系统调试方法

获取原文
获取原文并翻译 | 示例

摘要

Parallel computational frameworks for high-performance computing are central to the advancement of simulation-based studies in science and engineering. Unfortunately, finding and fixing bugs in these frameworks can be extremely time consuming. Left unchecked, these bugs can drastically diminish the amount of new science that can be performed. This article presents a systematic study of the Uintah Computational Framework and approaches to debug it more incisively. A key insight is to leverage the modular structure of Uintah, which lends itself to systematic debugging. In particular, the authors have developed a new approach based on coalesced stack trace graphs (CSTG) that summarize the system behavior in terms of key control flows manifested through function invocation chains. They illustrate several scenarios for how CSTGs could help efficiently localize bugs, and present a case study of how they found and fixed a real Uintah bug using CSTGs.
机译:高性能计算的并行计算框架对于科学和工程领域基于仿真的研究的发展至关重要。不幸的是,在这些框架中查找和修复错误可能非常耗时。如果不加以控制,这些错误将大大减少可以执行的新科学的数量。本文对Uintah计算框架进行了系统的研究,并对其进行了更精确的调试。一个关键的见解是利用Uintah的模块化结构,这有助于进行系统的调试。特别是,作者开发了一种基于合并堆栈跟踪图(CSTG)的新方法,该方法根据通过函数调用链显示的关键控制流来总结系统行为。他们说明了CSTG如何帮助有效地定位错误的几种方案,并提供了一个案例研究,说明了它们如何使用CSTG查找和修复真正的Uintah错误。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号