首页> 外文会议>Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International >A 1.45GHz 52-to-162GFLOPS/W variable-precision floating-point fused multiply-add unit with certainty tracking in 32nm CMOS
【24h】

A 1.45GHz 52-to-162GFLOPS/W variable-precision floating-point fused multiply-add unit with certainty tracking in 32nm CMOS

机译:在32nm CMOS中具有确定性跟踪的1.45GHz 52至162GFLOPS / W可变精度浮点融合乘加单元

获取原文
获取原文并翻译 | 示例

摘要

High-throughput floating-point computations are key building blocks of 3D graphics, signal processing and high-performance computing workloads [1,2]. Higher floating-point precisions offer improved accuracy at the expense of performance and energy efficiency, with variable-precision floating-point circuits providing run-time precision selection [3]. Real-time certainty tracking enables variable-precision circuits not only to operate at the higher energy efficiency of low-precision datapaths, but also to preserve high-precision accuracy. A variable-precision floating-point unit that performs fused multiply-adds (FMA) with single-cycle throughput while supporting operation in either 1-way single-precision (24b mantissa), 2-way 12b precision or 4-way 6b precision modes is fabricated in 32nm High-k/Metal-gate CMOS [4]. Simultaneous floating-point certainty tracking, preshifted addends, a combined rounding and negation incrementer, efficient reuse of mantissa datapath for multiple parallel lower precision calculations, robust ultra-low voltage circuits, and fine-grained clock gating enable nominal energy efficiency of 52GFLOPS/W (IEEE 32b single-precision, measured at 1.45GHz, 1.05V, 25°C) with a dense layout occupying 0.045mm2 (Fig. 10.3.7) while achieving: (i) scalable performance up to 3.6GFLOPS (single-precision), 96mW measured at 1.2V; (ii) up to 4× higher throughput of 14.4GFLOPS with variable-precision, while maintaining single-precision accuracy; (iii) fast single-cycle precision reconfigurability; (iv) precision mode-dependent power consumption for up to 40% clock power reduction; (v) near-threshold single-precision operation measured at 300mV, 1.75MHz, 11μW; and, (vi) peak energy efficiency of 321GFLOPS/W (single-precision) and 1.2TFLOPS/W (6b precision) at 325mV, 25°C.
机译:高通量浮点计算是3D图形,信号处理和高性能计算工作负载的关键构建块[1,2]。更高的浮点精度以提高性能和能源效率为代价提供了更高的精度,而可变精度浮点电路则提供了运行时精度选择[3]。实时确定性跟踪使可变精度电路不仅可以在低精度数据路径的更高能量效率下运行,而且还可以保持高精度。可变精度浮点单元,以单周期吞吐量执行融合乘法加法(FMA),同时支持1路单精度(24b尾数),2路12b精度或4路6b精度模式下的操作用32nm高k /金属栅CMOS [4]制造。同时进行浮点确定性跟踪,预移位加数,组合的舍入和取反增量器,有效地重用尾数数据路径以进行多个并行的较低精度计算,鲁棒的超低压电路以及细粒度的时钟门控,可实现52GFLOPS / W的标称能效(IEEE 32b单精度,在1.45GHz,1.05V,25°C下测量),密集布局占用0.045mm 2 (图10.3.7),同时实现:(i)可扩展的性能至3.6GFLOPS(单精度),在1.2V下测得96mW; (ii)在保持单精度精度的同时,具有可变精度的14.4GFLOPS的吞吐量提高了4倍; (iii)快速的单周期精确可重构性; (iv)与模式有关的精确功耗,最多可将时钟功耗降低40%; (v)在300mV,1.75MHz,11μW下测得的近阈值单精度工作; (vi)在325mV,25°C时的峰值能量效率为321GFLOPS / W(单精度)和1.2TFLOPS / W(6b精度)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号