AutoMatch: An automated framework for relative performance estimation and workload distribution on heterogeneous HPC systems

机译：AutoMatch：用于在异构HPC系统上进行相对性能评估和工作负载分配的自动化框架

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Porting sequential applications to heterogeneous HPC systems requires extensive software and hardware expertise to estimate the potential speedup and to efficiently use the available compute resources in such systems. To streamline this daunting process, researchers have proposed several “black-box” performance prediction approaches that rely on the performance of a training set of parallel applications. However, due to the lack of a diverse set of applications along with their optimized parallel implementations for each architecture type, the predicted speedup by these approaches is not the speedup upper-bound, and even worse it can be misleading, if the reference parallel implementations are not equally-optimized for every target architecture. This paper presents AutoMatch, an automated framework for matching of compute kernels to heterogeneous HPC architectures. AutoMatch uses hybrid (static and dynamic) analysis to find the best dependency-preserving parallel schedule of a given sequential code. The resulting operations schedule serves as a basis to construct a cost function of the optimized parallel execution of the sequential code on heterogeneous HPC nodes. Since such a cost function informs the user and runtime system about the relative execution cost across the different hardware devices within HPC nodes, AutoMatch enables efficient runtime workload distribution that simultaneously utilizes all the available devices in performance-proportional way. For a set of open-source HPC applications with different characteristics, AutoMatch turns out to be very effective, identifying the speedup upper-bound of sequential applications and how close the parallel implementation is to the best parallel performance across five different HPC architectures. Furthermore, AutoMatch's workload distribution scheme achieves approximately 90% of the performance of a profiling-driven oracle.

机译：将顺序应用程序移植到异构HPC系统需要广泛的软件和硬件专业知识，以估计潜在的加速并有效使用此类系统中的可用计算资源。为了简化这一艰巨的过程，研究人员提出了几种“黑盒”性能预测方法，这些方法依赖于一组并行应用程序的性能。但是，由于缺少各种应用程序以及针对每种体系结构类型的优化并行实现，因此这些方法的预期加速不是加速的上限，更糟糕的是，如果引用并行实现，则可能会产生误导并非针对每个目标体系结构都进行同样优化。本文介绍了AutoMatch，这是一种用于将计算内核与异构HPC架构进行匹配的自动化框架。 AutoMatch使用混合（静态和动态）分析来查找给定顺序代码的最佳保留依赖项的并行调度。产生的操作调度表是构建异构HPC节点上顺序代码的优化并行执行的成本函数的基础。由于这种成本函数会通知用户和运行时系统有关HPC节点内不同硬件设备的相对执行成本，因此AutoMatch可以实现高效的运行时工作负载分配，同时以与性能成比例的方式同时利用所有可用设备。对于一组具有不同特性的开源HPC应用程序，AutoMatch十分有效，它可以确定顺序应用程序的加速上限，并确定并行实现与五个不同HPC体系结构中最佳并行性能的接近程度。此外，AutoMatch的工作负载分配方案可实现性能分析驱动的Oracle的大约90％的性能。

著录项

来源
《2017 IEEE International Symposium on Workload Characterization》|2017年|32-42|共11页
会议地点 Seattle(US)
作者
Ahmed E. Helal; Wu-chun Feng; Changhee Jung; Yasser Y. Hanafy;
展开▼
作者单位

Electrical and Computer Eng., Virginia Tech;

Electrical and Computer Eng., Virginia Tech;

Computer Science, Virginia Tech;

Electrical and Computer Eng., Virginia Tech;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Computer architecture; Hardware; Synchronization; Performance evaluation; Tools; Analytical models; Prediction algorithms;

机译：计算机体系结构;硬件;同步;性能评估;工具;分析模型;预测算法;;

相似文献

外文文献
中文文献
专利

1. Bi-Objective Optimization of Data-Parallel Applications on Heterogeneous HPC Platforms for Performance and Energy Through Workload Distribution [J] . Khaleghzadeh Hamidreza, Fahad Muhammad, Shahid Arsalan, IEEE Transactions on Parallel and Distributed Systems . 2021,第3期

机译：通过工作负载分布对性能和能量的异构HPC平台数据并行应用的双目标优化
2. Workload Distribution Framework for the Parallel Solution of Large Structural Models on Heterogeneous PC Clusters [J] . Ozgur Kurc Journal of Computing in Civil Engineering . 2010,第2期

机译：异构PC集群上大型结构模型并行解决方案的工作负载分配框架
3. Modelling fracture in heterogeneous materials on HPC systems using a hybrid MPI/Fortran coarray multi-scale CAFE framework [J] . Shterenlikht A., Margetts L., Cebamanos L. Advances in Engineering Software . 2018,第NOVa期

机译：使用MPI / Fortran混合阵列多尺度CAFE框架在HPC系统上对异质材料中的裂缝建模
4. AutoMatch: An automated framework for relative performance estimation and workload distribution on heterogeneous HPC systems [C] . Ahmed E. Helal, Wu-chun Feng, Changhee Jung, IEEE International Symposium on Workload Characterization . 2017

机译：自动分离：异构HPC系统上的相对性能估计和工作负载分布的自动框架
5. Performance, Energy and Temperature Considerations for Job Scheduling and for Workload Distribution in Heterogeneous Systems. [D] . Alsubaihi, Shouq. 2017

机译：异构系统中的作业调度和工作负荷分配的性能，能量和温度注意事项。
6. A Numerical Analysis of the Cooling Performance of a Hybrid Personal Cooling System (HPCS): Effects of Ambient Temperature and Relative Humidity [O] . Pengjun Xu, Zhanxiao Kang, Faming Wang, 2020

机译：混合人体冷却系统（HPCS）冷却性能的数值分析：环境温度和相对湿度的影响
7. Workload Distribution Framework for the Parallel Solution of Large Structural Models on Heterogeneous PC Clusters [O] . Ozgur Kurc 2010

机译：非均相PC集群对大型结构模型的并行解决方案的工作量分配框架

AutoMatch: An automated framework for relative performance estimation and workload distribution on heterogeneous HPC systems

摘要

著录项

相似文献

相关主题

期刊订阅