首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium Workshops >An Energy-Efficient Asymmetric Multi-Processor for HPC Virtualization
【24h】

An Energy-Efficient Asymmetric Multi-Processor for HPC Virtualization

机译:用于HPC虚拟化的高能​​效非对称多处理器

获取原文

摘要

The Asymmetric Multiprocessor (AMP) architecture brings new opportunities to achieve better trade-offs between performance and operational/financial costs. This paper presents the case of an AMP to address poor I/O performance in a virtualized HPC system, by using small side-cores to offload I/O processing. We use full machine simulations to explore the micro-architectural parameter space in detail and perform an energy-delay-area analysis, taking into account the relationship between size and access delay in the caches. The simulation side-core model has been validated on the Atom processor, with performance counter metrics being within 11%. study focuses on TLBs and caches which our results show to have a remarkable impact on performance. Compared with a previous AMP study considering only performance and limited to existing hardware, our results confirm the broad nature of that design, including the preference for an asymmetric 2-way CPU pipeline. Our improved methodology also boosts the degree of confidence in these results. We however show that the optimal features of an efficient side-core are smaller and simpler L1/L2 caches (16KB 4-way and 16KB 2-way I/D caches and a 128KB 4-way L2 cache) and L1/L2 TLBs (32/48 entry fully associative L1 I/D LBs and 256 entry 4-way L2 I/D TLBs). Meanwhile, our analysis reveals that a processor module consisting of two big cores and a small side-core of our design can reduce average power, energy, and area by 9.2%, 8%, and 24.4%, respectively, compared with a module of three big cores (the AMD K10), while retaining performance (at the cost of 1.3% performance loss).
机译:非对称多处理器(AMP)架构为在性能和运营/财务成本之间取得更好的折衷提供了新的机会。本文介绍了使用小型侧核卸载I / O处理以解决虚拟HPC系统中I / O性能差的AMP情况。考虑到缓存中大小和访问延迟之间的关系,我们使用完整的机器模拟来详细研究微体系结构参数空间并执行能量延迟区域分析。仿真侧核模型已在Atom处理器上进行了验证,性能计数器指标在11%以内。研究重点是TLB和缓存,我们的结果表明它们对性能有显着影响。与以前的AMP研究仅考虑性能并仅限于现有硬件相比,我们的结果证实了该设计的广泛性质,包括对非对称2路CPU管线的偏好。我们改进的方法还提高了对这些结果的信心度。但是,我们表明,高效侧核心的最佳功能是更小,更简单的L1 / L2缓存(16KB 4路和16KB 2路I / D缓存以及128KB 4路L2缓存)和L1 / L2 TLB( 32/48入口全关联L1 I / D LB和256入口4路L2 I / D TLB)。同时,我们的分析表明,与我们的设计相比,由两个大核和一个小侧核组成的处理器模块可以将平均功率,能量和面积分别减少9.2%,8%和24.4%。一个三核的模块(AMD K10),同时保持性能(以性能损失1.3%的代价)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号