首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium Workshops >An Energy-Efficient Asymmetric Multi-Processor for HPC Virtualization
【24h】

An Energy-Efficient Asymmetric Multi-Processor for HPC Virtualization

机译:用于HPC虚拟化的节能非对称多处理器

获取原文

摘要

The Asymmetric Multiprocessor (AMP) architecture brings new opportunities to achieve better trade-offs between performance and operational/financial costs. This paper presents the case of an AMP to address poor I/O performance in a virtualized HPC system, by using small side-cores to offload I/O processing. We use full machine simulations to explore the micro-architectural parameter space in detail and perform an energy-delay-area analysis, taking into account the relationship between size and access delay in the caches. The simulation side-core model has been validated on the Atom processor, with performance counter metrics being within 11%. study focuses on TLBs and caches which our results show to have a remarkable impact on performance. Compared with a previous AMP study considering only performance and limited to existing hardware, our results confirm the broad nature of that design, including the preference for an asymmetric 2-way CPU pipeline. Our improved methodology also boosts the degree of confidence in these results. We however show that the optimal features of an efficient side-core are smaller and simpler L1/L2 caches (16KB 4-way and 16KB 2-way I/D caches and a 128KB 4-way L2 cache) and L1/L2 TLBs (32/48 entry fully associative L1 I/D LBs and 256 entry 4-way L2 I/D TLBs). Meanwhile, our analysis reveals that a processor module consisting of two big cores and a small side-core of our design can reduce average power, energy, and area by 9.2%, 8%, and 24.4%, respectively, compared with a module of three big cores (the AMD K10), while retaining performance (at the cost of 1.3% performance loss).
机译:非对称多处理器(AMP)架构带来了新的机会,以实现性能和运营/财务费用之间的更好的权衡。本文通过使用小侧面核来卸载I / O处理,提出了AMP在虚拟化HPC系统中解决了差的I / O性能的情况。我们使用全机模拟详细探索微架构参数空间并进行能量延迟区域分析,考虑到缓存中的大小和访问延迟之间的关系。模拟侧核模型已在原子处理器上验证,性能计数器在11 %之内。研究侧重于我们的结果表明对表现出显着影响的TLB和高速缓存。与前一个AMP学习相比,考虑性能并限于现有硬件,我们的结果证实了该设计的广泛性质,包括对不对称双向CPU管道的偏好。我们改进的方法也提高了这些结果的信心程度。然而,我们表明,有效的侧核的最佳特性更小,更简单的L1 / L2高速缓存(16KB 4路和16KB 2路I / D缓存和128KB 4路L2高速缓存)和L1 / L2 TLB( 32/48条目完全关联L1 I / D LBS和256条目4-Way L2 I / D TLB)。同时,我们的分析表明,由两个大核和我们设计的小侧核组成的处理器模块可以将平均功率,能量和面积减少9.2 %,8 %和24.4%。三个大核(AMD K10)的模块,同时保持性能(以成本为1.3 %性能损失)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号