首页> 外文会议>IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing >Visual Performance Analysis of Memory Behavior in a Task-Based Runtime on Hybrid Platforms
【24h】

Visual Performance Analysis of Memory Behavior in a Task-Based Runtime on Hybrid Platforms

机译:混合平台上基于任务的运行时中内存行为的可视化性能分析

获取原文

摘要

Programming parallel applications for heterogeneous HPC platforms is much more straightforward when using the task-based programming paradigm. The simplicity exists because a runtime takes care of many activities usually carried out by the application developer, such as task mapping, load balancing, and memory management operations. In this paper, we present a visualization-based performance analysis methodology to investigate the CPU-GPU-Disk memory management of the StarPU runtime, a popular task-based middleware for HPC applications. We detail the design of novel graphical strategies that were fundamental to recognize performance problems in four study cases. We first identify poor management of data handles when GPU memory is saturated, leading to low application performance. Our experiments using the dense tiled-based Cholesky factorization show that our fix leads to performance gains of 66% and better scalability for larger input sizes. In the other three cases, we study scenarios where the main memory is insufficient to store all the application's data, forcing the runtime to store data out-of-core. Using our methodology, we pin-point different behavior among schedulers and how we have identified a crucial problem in the application code regarding initial block placement, which leads to poor performance.
机译:使用基于任务的编程范例时,为异构HPC平台编程并行应用程序要简单得多。之所以存在这种简单性,是因为运行时可以处理通常由应用程序开发人员执行的许多活动,例如任务映射,负载平衡和内存管理操作。在本文中,我们提出了一种基于可视化的性能分析方法,以研究StarPU运行时的CPU-GPU-Disk内存管理,StarPU运行时是适用于HPC应用程序的流行的基于任务的中间件。我们详细介绍了新颖的图形策略的设计,这些策略对于识别四个研究案例中的性能问题至关重要。我们首先确定当GPU内存饱和时导致数据句柄管理不善,从而导致应用程序性能下降。我们使用基于密集图块的Cholesky因式分解进行的实验表明,我们的修复可将性能提高66%,并为较大的输入大小提供更好的可伸缩性。在其他三种情况下,我们研究了以下情况:主内存不足以存储所有应用程序的数据,从而迫使运行时将数据存储在核外。使用我们的方法,我们可以查明调度程序之间的不同行为,以及我们如何在应用程序代码中确定与初始块放置有关的关键问题,这将导致性能下降。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号