首页> 外文会议>IEEE International Symposium on Performance Analysis of Systems and Software >Crossing the architectural barrier: Evaluating representative regions of parallel HPC applications
【24h】

Crossing the architectural barrier: Evaluating representative regions of parallel HPC applications

机译:穿过建筑障碍:评估并行HPC应用的代表区域

获取原文

摘要

Exascale computing will get mankind closer to solving important social, scientific and engineering problems. Due to high prototyping costs, High Performance Computing (HPC) system architects make use of simulation models for design space exploration and hardware-software co-design. However, as HPC systems reach exascale proportions, the cost of simulation increases, since simulators themselves are largely single-threaded. Tools for selecting representative parts of parallel applications to reduce running costs are widespread, e.g., BarrierPoint achieves this by analysing, in simulation, abstract characteristics such as basic blocks and reuse distances. However, architectures new to HPC have a limited set of tools available. In this work, we provide an independent cross-architectural evaluation on real hardware - across Intel and ARM - of the BarrierPoint methodology, when applied to parallel HPC proxy applications. We present both cases: when the methodology can be applied and when it cannot. In the former case, results show that we can predict the performance of full application execution by running shorter representative sections. In the latter case, we dive into the underlying issues and suggest improvements. We demonstrate a total simulation time reduction of up to 178x, whilst keeping the error below 2.3% for both cycles and instructions.
机译:ExaScale Computing将更加接近解决重要的社会,科学和工程问题。由于高原型成本,高性能计算(HPC)系统架构师利用设计空间探索和硬件软件共同设计的仿真模型。然而,随着HPC系统达到ExaSGale比例,模拟成本增加,因为模拟器本身在很大程度上是单线螺纹的。用于选择并行应用的代表性部分以降低运行成本的工具是广泛的,例如,通过分析仿真,摘要特性,如基本块和重用距离,因此巴利赛点实现这一点。但是,HPC新建的架构有一组有限的工具。在这项工作中,我们在跨英特尔和手臂 - 横贯和手臂上提供了一个独立的跨体系结构评估,禁止票据方法,当应用于并行HPC代理应用程序。我们展示了这两种情况:当可以应用方法时,它不能。在前一种情况下,结果表明我们可以通过运行更短的代表部分来预测完全应用程序执行的性能。在后一种情况下,我们潜入潜在的问题并建议改进。我们展示了最多178倍的总模拟时间,同时将误差保持在2.3 %以下的循环和指令。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号