首页> 外文会议>IEEE International Symposium on High Performance Computer Architecture >Dynamically Specialized Datapaths for energy efficient computing
【24h】

Dynamically Specialized Datapaths for energy efficient computing

机译:动态专用数据路径,用于节能计算

获取原文

摘要

Due to limits in technology scaling, energy efficiency of logic devices is decreasing in successive generations. To provide continued performance improvements without increasing power, regardless of the sequential or parallel nature of the application, microarchitectural energy efficiency must improve. We propose Dynamically Specialized Datapaths to improve the energy efficiency of general purpose programmable processors. The key insights of this work are the following. First, applications execute in phases and these phases can be determined by creating a path-tree of basic-blocks rooted at the inner-most loop. Second, specialized datapaths corresponding to these path-trees, which we refer to as DySER blocks, can be constructed by interconnecting a set of heterogeneous computation units with a circuit-switched network. These blocks can be easily integrated with a processor pipeline. A synthesized RTL implementation using an industry 55nm technology library shows a 64-functional-unit DySER block occupies approximately the same area as a 64 KB single-ported SRAM and can execute at 2 GHz. We extend the GCC compiler to identify path-trees and code-mapping to DySER and evaluate the PAR-SEC, SPEC and Parboil benchmarks suites. Our results show that in most cases two DySER blocks can achieve the same performance (within 5%) as having a specialized hardware module for each path-tree. A 64-FU DySER block can cover 12% to 100% of the dynamically executed instruction stream. When integrated with a dual-issue out-of-order processor, two DySER blocks provide geometric mean speedup of 2.1X (1.15X to 10X), and geometric mean energy reduction of 40% (up to 70%), and 60% energy reduction if no performance improvement is required.
机译:由于技术缩放的限制,逻辑器件的能效在连续几代人数下降。为了在不增加功率的情况下提供持续的性能改进,无论应用的顺序或并行性质如何,都必须改善微体建筑能效。我们提出了动态专业的数据路径,以提高通用可编程处理器的能效。这项工作的关键见解是以下内容。首先,可以通过在最内部循环处创建源性的基本块的路径树来确定在阶段和这些阶段中执行的应用程序。第二,与这些路径树相对应的专用数据路径,我们将其称为呼吸障碍块,可以通过用电路交换网络互连一组异构的计算单元来构造。这些块可以很容易地与处理器管道集成。使用工业55nm技术库的合成RTL实现显示了一个64功能 - 单位粘滞块占据大致相同的区域为64 kB单端的SRAM,并且可以在2 GHz处执行。我们扩展了GCC编译器,以识别跟踪树和代码映射到清理程序,并评估PAR-SEC,SEC和PARBOIL基准套件。我们的结果表明,在大多数情况下,在大多数情况下,两个粘滞块可以实现与每个路径树的专用硬件模块相同的性能(5%)。 64-fu粘滞块可以覆盖动态执行的指令流的12%至100%。当与双问题无序处理器集成时,两个粘滞块提供2.1倍(1.15倍至10倍)的几何平均加速,几何平均能量降低40%(高达70%)和60%的能量如果不需要性能改进,则减少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号