首页> 外文会议>2017 IEEE International Symposium on Workload Characterization >AutoMatch: An automated framework for relative performance estimation and workload distribution on heterogeneous HPC systems
【24h】

AutoMatch: An automated framework for relative performance estimation and workload distribution on heterogeneous HPC systems

机译:AutoMatch:用于在异构HPC系统上进行相对性能评估和工作负载分配的自动化框架

获取原文
获取原文并翻译 | 示例

摘要

Porting sequential applications to heterogeneous HPC systems requires extensive software and hardware expertise to estimate the potential speedup and to efficiently use the available compute resources in such systems. To streamline this daunting process, researchers have proposed several “black-box” performance prediction approaches that rely on the performance of a training set of parallel applications. However, due to the lack of a diverse set of applications along with their optimized parallel implementations for each architecture type, the predicted speedup by these approaches is not the speedup upper-bound, and even worse it can be misleading, if the reference parallel implementations are not equally-optimized for every target architecture. This paper presents AutoMatch, an automated framework for matching of compute kernels to heterogeneous HPC architectures. AutoMatch uses hybrid (static and dynamic) analysis to find the best dependency-preserving parallel schedule of a given sequential code. The resulting operations schedule serves as a basis to construct a cost function of the optimized parallel execution of the sequential code on heterogeneous HPC nodes. Since such a cost function informs the user and runtime system about the relative execution cost across the different hardware devices within HPC nodes, AutoMatch enables efficient runtime workload distribution that simultaneously utilizes all the available devices in performance-proportional way. For a set of open-source HPC applications with different characteristics, AutoMatch turns out to be very effective, identifying the speedup upper-bound of sequential applications and how close the parallel implementation is to the best parallel performance across five different HPC architectures. Furthermore, AutoMatch's workload distribution scheme achieves approximately 90% of the performance of a profiling-driven oracle.
机译:将顺序应用程序移植到异构HPC系统需要广泛的软件和硬件专业知识,以估计潜在的加速并有效使用此类系统中的可用计算资源。为了简化这一艰巨的过程,研究人员提出了几种“黑盒”性能预测方法,这些方法依赖于一组并行应用程序的性能。但是,由于缺少各种应用程序以及针对每种体系结构类型的优化并行实现,因此这些方法的预期加速不是加速的上限,更糟糕的是,如果引用并行实现,则可能会产生误导并非针对每个目标体系结构都进行同样优化。本文介绍了AutoMatch,这是一种用于将计算内核与异构HPC架构进行匹配的自动化框架。 AutoMatch使用混合(静态和动态)分析来查找给定顺序代码的最佳保留依赖项的并行调度。产生的操作调度表是构建异构HPC节点上顺序代码的优化并行执行的成本函数的基础。由于这种成本函数会通知用户和运行时系统有关HPC节点内不同硬件设备的相对执行成本,因此AutoMatch可以实现高效的运行时工作负载分配,同时以与性能成比例的方式同时利用所有可用设备。对于一组具有不同特性的开源HPC应用程序,AutoMatch十分有效,它可以确定顺序应用程序的加速上限,并确定并行实现与五个不同HPC体系结构中最佳并行性能的接近程度。此外,AutoMatch的工作负载分配方案可实现性能分析驱动的Oracle的大约90%的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号