首页> 外文会议>Design, Automation Test in Europe Conference Exhibition >A Transprecision Floating-Point Platform for Ultra-Low Power Computing
【24h】

A Transprecision Floating-Point Platform for Ultra-Low Power Computing

机译:超低功耗计算的反专权浮点平台

获取原文

摘要

In modern low-power embedded platforms, the execution of floating-point (FP) operations emerges as a major contributor to the energy consumption of compute-intensive applications with large dynamic range. Experimental evidence shows that 50% of the energy consumed by a core and its data memory is related to FP computations. The adoption of FP formats requiring a lower number of bits is an interesting opportunity to reduce energy consumption, since it allows to simplify the arithmetic circuitry and to reduce the memory bandwidth required to transfer data between memory and registers by enabling vectorization. From a theoretical point of view, the adoption of multiple FP types perfectly fits with the principle of transprecision computing, allowing fine-grained control of approximation while meeting specified constraints on the precision of final results. In this paper we propose an extended FP type system with complete hardware support to enable transprecision computing on low-power embedded processors, including two standard formats (binary32 and binary16) and two new formats (binary8 and binary16alt). First, we introduce a software library that enables exploration of FP types by tuning both precision and dynamic range of program variables. Then, we present a methodology to integrate our library with an external tool for precision tuning, and experimental results that highlight the clear benefits of introducing the new formats. Finally, we present the design of a transprecision FP unit capable of handling 8-bit and 16-bit operations in addition to standard 32-bit operations. Experimental results on FP-intensive benchmarks show that up to 90% of FP operations can be safely scaled down to 8-bit or 16-bit formats. Thanks to precision tuning and vectorization, execution time is decreased by 12% and memory accesses are reduced by 27% on average, leading to a reduction of energy consumption up to 30%.
机译:在现代低功耗嵌入式平台中,浮点(FP)业务的执行成为具有大动态范围大的计算密集型应用的能耗的主要贡献者。实验证据表明,核心消耗的50%的能量及其数据存储器与FP计算有关。需要较低数量的FP格式是减少能量消耗的有趣机会,因为它允许简化算术电路并通过启用矢量化来减少在存储器和寄存器之间传输数据所需的存储器带宽。从理论的角度来看,多种FP类型的采用完全符合实际计算的原理,允许细粒度控制近似,同时满足最终结果精度的特定约束。在本文中,我们提出了一个具有完整硬件支持的扩展FP型系统,以使实际计算在低功耗嵌入式处理器上,包括两个标准格式(Binary32和Binary16)和两种新格式(Binary8和Binary16Alt)。首先,我们介绍一个软件库,可以通过调整程序变量的精度和动态范围来探索FP类型。然后,我们提出了一种方法来将我们的图书馆与外部工具集成,用于精确调谐,实验结果突出了引入新格式的明显效益。最后,除了标准的32位操作之外,我们还提供了能够处理8位和16位操作的Transprecision FP单元的设计。 FP-Intolly基准测试结果表明,高达90%的FP操作可以安全地缩放到8位或16位格式。由于精确调谐和矢量化,执行时间减少了12%,并且内存访问平均减少了27%,导致能耗降低高达30%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号