首页> 外文OA文献 >Rebalancing the core front-end through HPC code analysis
【2h】

Rebalancing the core front-end through HPC code analysis

机译:通过HpC代码分析重新平衡核心前端

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

There is a need to increase performance under the same power and area envelope to achieve Exascale technology in high performance computing (HPC). The today's chip multiprocessor (CMP) design is tailored by traditional desktop and server workloads, different from parallel applications commonly run in HPC. In this work, we focus on the HPC code characteristics and processor front-end which factors around 30% of core power and area on the emerging lean-core type of processors used in HPC. Separating serial from parallel code sections inside applications, we characterize three HPC benchmark suites and compare them to a traditional set of desktop integer workloads. HPC applications have biased and mostly backward taken branches, small dynamic instruction footprints, and long basic blocks. Our findings suggest smaller branch predictors (BP) with the additional loop BP, smaller branch target buffers (BTB), and smaller L1 instruction caches (I-cache) with wider lines. Still, the aforementioned downsizing applies only to the cores meant to run parallel code. The difference between serial and parallel code sections in HPC applications points to an asymmetric CMP design, with one baseline core for sequential and many HPCtailored cores designed for parallel code. Predictions using Sniper simulator and McPAT show that an HPC-tailored lean core saves 16% of the core area and 7% of power compared to a baseline core, without performance loss. Using the area savings to add an extra core, an asymmetric CMP with one baseline and eight tailored cores has the same area budget as a symmetric CMP composed out of eight baseline cores demanding 4% more power and providing 12% shorter execution time on average.
机译:为了在高性能计算(HPC)中实现Exascale技术,需要在相同的功率和面积范围内提高性能。当今的芯片多处理器(CMP)设计是根据传统的台式机和服务器工作负载量身定制的,这与HPC中通常运行的并行应用程序不同。在这项工作中,我们将重点放在HPC代码特性和处理器前端上,这些特性占HPC中使用的新兴精益内核类型处理器的核心功率和面积的30%左右。将应用程序中的串行代码部分与并行代码部分分开,我们对三个HPC基准测试套件进行了表征,并将它们与传统的桌面整数工作负载集进行比较。 HPC应用程序具有偏向且大多为后向分支,较小的动态指令占用空间和较长的基本块。我们的发现表明,带有附加循环BP的较小的分支预测变量(BP),较小的分支目标缓冲区(BTB)和具有较宽行的较小的L1指令高速缓存(I-cache)。尽管如此,前面提到的小型化仅适用于旨在运行并行代码的内核。 HPC应用程序中串行代码部分和并行代码部分之间的区别指向一种不对称的CMP设计,其中一个基线内核用于顺序,许多HPCtailored内核用于并行代码。使用Sniper模拟器和McPAT进行的预测表明,与基准内核相比,HPC量身定制的精益内核节省了16%的内核面积和7%的功率,而没有性能损失。使用面积节省来添加额外的内核,具有一个基线和八个定制内核的非对称CMP与由八个基线内核组成的对称CMP具有相同的面积预算,这需要增加4%的功耗并平均缩短12%的执行时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号