首页> 外文期刊>Concurrency and computation: practice and experience >A visual performance analysis framework for task-based parallel applications running on hybrid clusters
【24h】

A visual performance analysis framework for task-based parallel applications running on hybrid clusters

机译:在混合集群上运行的基于任务的并行应用程序的可视化性能分析框架

获取原文
获取原文并翻译 | 示例

摘要

Programming paradigms in High-Performance Computing have been shifting toward task-basedrnmodels that are capable of adapting readily to heterogeneous and scalable supercomputers. Thernperformance of task-based application heavily depends on the runtime scheduling heuristicsrnand on its ability to exploit computing and communication resources. Unfortunately, the traditionalrnperformance analysis strategies are unfit to fully understand task-based runtime systemsrnand applications: they expect a regular behavior with communication and computation phases,rnwhile task-based applications demonstrate no clear phases. Moreover, the finer granularity ofrntask-based applications typically induces a stochastic behavior that leads to irregular structuresrnthat are difficult to analyze. Furthermore, the combination of application structure, scheduler,rnand hardware information is generally essential to understand performance issues. This paperrnpresents a flexible framework that enables one to combine several sources of information andrnto create custom visualization panels allowing to understand and pinpoint performance problemsrnincurred by bad scheduling decisions in task-based applications. Three case-studies usingrnStarPU-MPI, a task-based multi-node runtime system, are detailed to show how our frameworkrncan be used to study the performance of the well-known Cholesky factorization. Performancernimprovements include a better task partitioningamongthemulti-(GPU, core) toget closer to theoreticalrnlower bounds, improved MPI pipelining inmulti-(node, core,GPU) to reduce the slow start,rnand changes in the runtime system to increaseMPI bandwidth, with gains of up to13%in the totalrnmakespan.
机译:高性能计算中的编程范例已转向基于任务的模型,该模型能够轻松适应异构和可扩展的超级计算机。基于任务的应用程序的性能在很大程度上取决于运行时调度试探法及其利用计算和通信资源的能力。不幸的是,传统的性能分析策略不适合完全理解基于任务的运行时系统和应用程序:它们期望通信和计算阶段的行为正常,而基于任务的应用程序却没有明确的阶段。此外,基于任务的应用程序的更精细的粒度通常会导致随机行为,从而导致难以分析的不规则结构。此外,应用程序结构,调度程序,硬件和硬件信息的组合通常对于理解性能问题至关重要。本文提出了一种灵活的框架,该框架使人们能够组合多种信息源并创建自定义的可视化面板,从而可以了解并查明由于基于任务的应用程序中不良的调度决策而导致的性能问题。详细介绍了三个使用基于任务的多节点运行时系统StarPU-MPI的案例研究,以说明如何使用我们的框架来研究著名的Cholesky分解的性能。性能方面的改进包括:在多GPU(核心)之间更好地进行任务分配,以更接近理论上的下限;改进了多(节点,核心,GPU)中的MPI流水线以减少启动缓慢;在运行时系统中进行更改以增加MPI带宽,从而获得最大的收益。占总数的13%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号