首页> 外文期刊>Journal of Parallel and Distributed Computing >Compiler-assisted energy optimization for clustered VLIW processors
【24h】

Compiler-assisted energy optimization for clustered VLIW processors

机译:群集VLIW处理器的编译器辅助能源优化

获取原文
获取原文并翻译 | 示例
           

摘要

Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving the clock speed, reducing the energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long global wires having high load capacitance which leads to delay in execution and significantly high energy consumption. Inter-cluster communication also introduces many short idle cycles, thereby significantly increasing the overall leakage energy consumption in the functional units. The trend towards miniaturization of devices (and associated reduction in threshold voltage) makes energy consumption in interconnects and functional units even worse, and limits the usability of clustered architectures in smaller technologies. However, technological advancements now permit the design of interconnects and functional units with varying performance and power modes. In this paper, we propose scheduling algorithms that aggregate the scheduling slack of instructions and communication slack of data values to exploit the low-power modes of functional units and interconnects. Finally, we present a synergistic combination of these algorithms that simultaneously saves energy in functional units and interconnects to improves the usability of clustered architectures by achieving better overall energy-performance trade-offs. Even with conservative estimates of the contribution of the functional units and interconnects to the overall processor energy consumption, the proposed combined scheme obtains on average 8% and 10% improvement in overall energy-delay product with 3.5% and 2% performance degradation for a 2-clustered and a 4-clustered machine, respectively. We present a detailed experimental evaluation of the proposed schemes. Our test bed uses the Trimaran compiler infrastructure.
机译:群集体系结构处理器是嵌入式系统的首选,因为集中式寄存器文件体系结构在时钟速率,芯片面积和功耗方面的扩展性很差。尽管集群有助于提高时钟速度,减少逻辑的能耗并简化设计,但它通过集群间通信引入了额外的开销。这种通信在具有高负载电容的长全局导线上发生,这会导致执行延迟并显着提高能耗。集群间通信还引入了许多短的空闲周期,从而显着增加了功能单元中的总体泄漏能耗。设备小型化的趋势(以及相应的阈值电压降低)使互连和功能单元的能耗更加恶化,并限制了较小技术中集群体系结构的可用性。但是,技术进步现在允许设计具有不同性能和功率模式的互连和功能单元。在本文中,我们提出了一种调度算法,该算法可汇总指令的调度松弛和数据值的通信松弛,以利用功能单元和互连的低功耗模式。最后,我们提出了这些算法的协同组合,同时通过实现更好的整体能源性能折衷来同时节省功能单元和互连中的能源,以提高集群架构的可用性。即使保守估计功能单元和互连对整体处理器能耗的贡献,所提出的组合方案也可以使整体能源延迟产品平均提高8%和10%,而性能降低2%则为3.5%和2%集群和4集群计算机。我们对提出的方案进行了详细的实验评估。我们的测试平台使用Trimaran编译器基础结构。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号