首页> 外文会议>2011 13th International Symposium on Integrated Circuits >A 2.72GOPS/11mW low power reconfigurable accelerator with a highly parallel datapath consisting of combinatorial circuits in 65nm CMOS
【24h】

A 2.72GOPS/11mW low power reconfigurable accelerator with a highly parallel datapath consisting of combinatorial circuits in 65nm CMOS

机译:具有高度并行数据路径的2.72GOPS / 11mW低功耗可重配置加速器,由65nm CMOS组合电路组成

获取原文
获取原文并翻译 | 示例

摘要

CMA (Cool Mega-Array) is a high energy-efficiency reconfigurable accelerator for battery-driven mobile devices. It consists of a large processing element (PE) array without memory elements for mapping the data-flow graph of the application being executed, a small simple programmable micro-controller for data management, and a data memory. Unlike traditional coarse grained reconfigurable processors in which each PE provides registers and context memory, a CMA rduces power consumption by doing away with that for switching of hardware context and storing intermediate data in registers and their clock distribution. Although the data-flow graph mapped on the PE array is static during execution, various application programs can be implemented by making the best use of flexible data management instructions in the micro-controller. When the delay time of the PE array is shorter than the data handling time taken by the micro-controller, the supply voltage for the PE array is scaled to reduce the power consumption without degrading the performance. In contrast, when the delay time of the PE array is longer, wave pipelining is applied to enhance performance of the PE array. A prototype CMA chip (CMA-1) with 8 × 8 PE array with 24-bit data width was fabricated on the basis of 2.1× 4.2-mm 65-nm CMOS technology, and achieves sustained performance of 2.5-GOPS/11.2-mW. This energy efficiency is comparable to that of the most-energy-efficient accelerators that have been reported.
机译:CMA(超大型阵列)是一种用于电池驱动的移动设备的高能效可重构加速器。它由一个大型处理元件(PE)阵列(不带用于映射正在执行的应用程序的数据流图的存储元件),一个用于数据管理的小型简单可编程微控制器以及一个数据存储器组成。与传统的粗粒度可重构处理器(其中每个PE提供寄存器和上下文存储器)不同,CMA通过消除用于切换硬件上下文并将中间数据存储在寄存器中及其时钟分配的功耗来降低功耗。尽管在执行期间映射到PE阵列的数据流图是静态的,但可以通过充分利用微控制器中的灵活数据管理指令来实现各种应用程序。当PE阵列的延迟时间短于微控制器占用的数据处理时间时,可以缩放PE阵列的电源电压以降低功耗而不会降低性能。相反,当PE阵列的延迟时间较长时,可以应用流水线技术来增强PE阵列的性能。基于2.1×4.2-mm 65-nm CMOS技术制造了具有8×8 PE阵列且数据宽度为24位的CMA原型芯片(CMA-1),并实现了2.5-GOPS / 11.2-mW的持续性能。 。这种能源效率可与已报道的最节能的加速器相媲美。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号