A visual performance analysis framework for task-based parallel applications running on hybrid clusters

Vinícius Garcia Pinto; LucasMello Schnorr; Luka Stanisic; Arnaud Legrand; Samuel Thibault; Vincent Danjean

首页> 外文期刊>Concurrency and computation: practice and experience >A visual performance analysis framework for task-based parallel applications running on hybrid clusters

【24h】

A visual performance analysis framework for task-based parallel applications running on hybrid clusters

机译：在混合集群上运行的基于任务的并行应用程序的可视化性能分析框架

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Programming paradigms in High-Performance Computing have been shifting toward task-basedrnmodels that are capable of adapting readily to heterogeneous and scalable supercomputers. Thernperformance of task-based application heavily depends on the runtime scheduling heuristicsrnand on its ability to exploit computing and communication resources. Unfortunately, the traditionalrnperformance analysis strategies are unfit to fully understand task-based runtime systemsrnand applications: they expect a regular behavior with communication and computation phases,rnwhile task-based applications demonstrate no clear phases. Moreover, the finer granularity ofrntask-based applications typically induces a stochastic behavior that leads to irregular structuresrnthat are difficult to analyze. Furthermore, the combination of application structure, scheduler,rnand hardware information is generally essential to understand performance issues. This paperrnpresents a flexible framework that enables one to combine several sources of information andrnto create custom visualization panels allowing to understand and pinpoint performance problemsrnincurred by bad scheduling decisions in task-based applications. Three case-studies usingrnStarPU-MPI, a task-based multi-node runtime system, are detailed to show how our frameworkrncan be used to study the performance of the well-known Cholesky factorization. Performancernimprovements include a better task partitioningamongthemulti-(GPU, core) toget closer to theoreticalrnlower bounds, improved MPI pipelining inmulti-(node, core,GPU) to reduce the slow start,rnand changes in the runtime system to increaseMPI bandwidth, with gains of up to13%in the totalrnmakespan.

机译：高性能计算中的编程范例已转向基于任务的模型，该模型能够轻松适应异构和可扩展的超级计算机。基于任务的应用程序的性能在很大程度上取决于运行时调度试探法及其利用计算和通信资源的能力。不幸的是，传统的性能分析策略不适合完全理解基于任务的运行时系统和应用程序：它们期望通信和计算阶段的行为正常，而基于任务的应用程序却没有明确的阶段。此外，基于任务的应用程序的更精细的粒度通常会导致随机行为，从而导致难以分析的不规则结构。此外，应用程序结构，调度程序，硬件和硬件信息的组合通常对于理解性能问题至关重要。本文提出了一种灵活的框架，该框架使人们能够组合多种信息源并创建自定义的可视化面板，从而可以了解并查明由于基于任务的应用程序中不良的调度决策而导致的性能问题。详细介绍了三个使用基于任务的多节点运行时系统StarPU-MPI的案例研究，以说明如何使用我们的框架来研究著名的Cholesky分解的性能。性能方面的改进包括：在多GPU（核心）之间更好地进行任务分配，以更接近理论上的下限；改进了多（节点，核心，GPU）中的MPI流水线以减少启动缓慢；在运行时系统中进行更改以增加MPI带宽，从而获得最大的收益。占总数的13％。

著录项

来源
《Concurrency and computation: practice and experience》 |2018年第18期|e4472.1-e4472.27|共27页
作者
Vinícius Garcia Pinto; LucasMello Schnorr; Luka Stanisic; Arnaud Legrand; Samuel Thibault; Vincent Danjean;
展开▼
作者单位

Institute of Informatics, Federal University ofRio Grande do Sul (UFRGS), Porto Alegre, Brazil Laboratoire d'Informatique de Grenoble,Université Grenoble Alpes, Inria, CNRS,Grenoble INP, Grenoble, France;

Institute of Informatics, Federal University ofRio Grande do Sul (UFRGS), Porto Alegre, Brazil Laboratoire d'Informatique de Grenoble,Université Grenoble Alpes, Inria, CNRS,Grenoble INP, Grenoble, France;

Max Planck Computing and Data Facility, Garching, Germany;

Laboratoire d'Informatique de Grenoble, Université Grenoble Alpes, Inria, CNRS, Grenoble INP, Grenoble, France;

Inria Bordeaux Sud-Ouest, Bordeaux, France;

Laboratoire d'Informatique de Grenoble, Université Grenoble Alpes, Inria, CNRS, Grenoble INP, Grenoble, France;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Cholesky; heterogeneous platforms; high-performance computing; task-based applications; trace visualization;

机译：霍尔斯基异构平台;高性能计算;基于任务的应用程序;追踪可视化;

相似文献

外文文献
中文文献
专利

1. A general framework to understand parallel performance in heterogeneous clusters: analysis of a new adaptive parallel genetic algorithm [J] . Bazterra VE, Cuma M, Ferraro MB, Journal of Parallel and Distributed Computing . 2005,第1期

机译：理解异构集群并行性能的一般框架：一种新的自适应并行遗传算法的分析
2. French Pharmaceutical Strategic Clusters: A Hybrid Inducto-deductive Framework for Visual Analysis of Competitive Spaces [J] . Philippe Rebiere, Hareesh Mavoori Strategic change . 2016,第4期

机译：法国制药战略集群：竞争空间视觉分析的混合归纳演绎框架
3. Hybrid parallelisation scheme for the application of distributed near-field sparse approximate inverse preconditioners on high-performance computing clusters [J] . Delgado Carlos, Garcia Eliseo, Somolinos Alvaro, Microwaves, Antennas & Propagation, IET . 2020,第4期

机译：用于在高性能计算集群上应用分布式近场稀疏近似反向前提例的混合平行方案
4. Analysis of large circularly polarized antenna array by using a parallelized FDTD code running on a high performance cluster [C] . Zheng Li, Wenhua Yu, Junhong Wang, IEEE Antennas and Propagation Society International Symposium;APSURSI '09 . 2009

机译：使用在高性能集群上运行的并行FDTD代码分析大型圆极化天线阵列
5. A journey through performance evaluation, tuning, and analysis of parallelized applications and parallel architectures: Quantitative approach. [D] . Mustafa, Dheya G. 2013

机译：并行应用程序和并行体系结构的性能评估，调整和分析的过程：定量方法。
6. K-mer clustering algorithm using a MapReduce framework: application to the parallelization of the Inchworm module of Trinity [O] . Chang Sik Kim, Martyn D. Winn, Vipin Sachdeva, 2017

机译：使用MapReduce框架的K-mer聚类算法：在Trinity的Inchworm模块并行化中的应用
7. A Visual Performance Analysis Framework for Task-based Parallel Applications running on Hybrid Clusters [O] . Garcia Pinto, Vinicius, Schnorr, Lucas,, Stanisic, Luka, 2017

机译：在混合集群上运行的基于任务的并行应用程序的视觉性能分析框架

A visual performance analysis framework for task-based parallel applications running on hybrid clusters

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅