首页> 外文期刊>Journal of Parallel and Distributed Computing >Extending a traditional debugger to debug massively parallel applications
【24h】

Extending a traditional debugger to debug massively parallel applications

机译:扩展传统的调试器以调试大规模并行应用程序

获取原文
获取原文并翻译 | 示例

摘要

Beowulf systems, and other proprietary approaches, are placing systems with four or more CPUs in the hands of many researchers and commercial users. In the near future, systems with hundreds of CPUs will become commonly available, with some programmers dealing with tens of thousands of CPUs. The debugging methods used on these systems are a combination of the traditional methods used for debugging single processes and ad-hoc methods to help the user cope with the multitudes of processes. Programmers are usually familiar with a single-process debugger and would like to use it (with minimal user-visible extensions) to debug their distributed program. We present a set of modifications to a traditional debugger that makes it capable of debugging applications running on thousands of processes. Our parallel debugger is composed of individual fully functional debuggers connected with an n-nary aggregating network. This permits us to present to users the results from each debugger at the same time in an aggregated fashion. Users get a global view of the application and can easily see if a given parameter has a different value from either what they expect it to be or from the other processes. Users can then focus on the process sets of. interest and investigate the problem. One challenge when debugging thousands of processes is to deal with the amount of output coming from all the debuggers. We present methods to aggregate the overwhelming amount of output from the debuggers into a more manageable subset, which is presented to the user without losing information. Experiments show that the debugger is scalable to thousands of processors. The startup mechanism, as well as users' command response time scale well. The conclusions presented regarding the architecture and the new parallel debugger's scalability are not specific to the serial debugger we are using in our example implementation.
机译:Beowulf系统和其他专有方法正在将具有四个或更多CPU的系统置于许多研究人员和商业用户的手中。在不久的将来,具有数百个CPU的系统将变得普遍可用,一些程序员将处理数万个CPU。这些系统上使用的调试方法是用于调试单个进程的传统方法和临时方法的组合,以帮助用户应对众多进程。程序员通常熟悉单进程调试器,并希望使用它(具有最少的用户可见扩展)来调试其分布式程序。我们对传统的调试器进行了一系列修改,使其能够调试在数千个进程上运行的应用程序。我们的并行调试器由与n元聚合网络连接的各个功能齐全的调试器组成。这使我们能够以聚合的方式同时向用户展示每个调试器的结果。用户可以获得应用程序的全局视图,并且可以轻松地查看给定参数的值是否不同于他们期望的值或其他进程的值。然后,用户可以专注于流程集。感兴趣并调查问题。调试数千个进程时的挑战之一是处理来自所有调试器的输出量。我们提出了将来自调试器的压倒性数量的输出聚合到一个更易于管理的子集中的方法,该子集在不丢失信息的情况下呈现给用户。实验表明,调试器可扩展到数千个处理器。启动机制以及用户的命令响应时间可以很好地扩展。提出的有关体系结构和新的并行调试器的可伸缩性的结论并不特定于我们在示例实现中使用的串行调试器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号