首页> 外文会议>2017 IEEE International Symposium on Performance Analysis of Systems and Software >Crossing the architectural barrier: Evaluating representative regions of parallel HPC applications
【24h】

Crossing the architectural barrier: Evaluating representative regions of parallel HPC applications

机译:跨越架构障碍:评估并行HPC应用程序的代表性区域

获取原文
获取原文并翻译 | 示例

摘要

Exascale computing will get mankind closer to solving important social, scientific and engineering problems. Due to high prototyping costs, High Performance Computing (HPC) system architects make use of simulation models for design space exploration and hardware-software co-design. However, as HPC systems reach exascale proportions, the cost of simulation increases, since simulators themselves are largely single-threaded. Tools for selecting representative parts of parallel applications to reduce running costs are widespread, e.g., BarrierPoint achieves this by analysing, in simulation, abstract characteristics such as basic blocks and reuse distances. However, architectures new to HPC have a limited set of tools available. In this work, we provide an independent cross-architectural evaluation on real hardware - across Intel and ARM - of the BarrierPoint methodology, when applied to parallel HPC proxy applications. We present both cases: when the methodology can be applied and when it cannot. In the former case, results show that we can predict the performance of full application execution by running shorter representative sections. In the latter case, we dive into the underlying issues and suggest improvements. We demonstrate a total simulation time reduction of up to 178x, whilst keeping the error below 2.3% for both cycles and instructions.
机译:万亿级计算将使人类更接近解决重要的社会,科学和工程问题。由于高昂的原型设计成本,高性能计算(HPC)系统架构师将仿真模型用于设计空间探索和软硬件协同设计。但是,随着HPC系统达到百亿亿美元级的规模,由于仿真器本身主要是单线程的,因此仿真的成本增加了。选择并行应用程序代表性部分以降低运行成本的工具非常广泛,例如,BarrierPoint通过在仿真中分析抽象特征(例如基本块和重用距离)来实现这一目标。但是,HPC的新体系结构只能使用有限的一组工具。在这项工作中,当将BarrierPoint方法应用于并行HPC代理应用程序时,我们将对跨Intel和ARM的真实硬件进行独立的跨体系结构评估。我们介绍两种情况:什么时候可以应用方法论,什么时候不能应用。在前一种情况下,结果表明我们可以通过运行较短的代表部分来预测完整应用程序执行的性能。在后一种情况下,我们将深入研究潜在问题并提出改进建议。我们证明了整个仿真时间最多可减少178倍,同时使周期和指令的误差均低于2.3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号