首页> 外文会议>2018 Design, Automation amp; Test in Europe Conference amp; Exhibition >A transprecision floating-point platform for ultra-low power computing
【24h】

A transprecision floating-point platform for ultra-low power computing

机译:用于超低功耗计算的超高精度浮点平台

获取原文
获取原文并翻译 | 示例

摘要

In modern low-power embedded platforms, the execution of floating-point (FP) operations emerges as a major contributor to the energy consumption of compute-intensive applications with large dynamic range. Experimental evidence shows that 50% of the energy consumed by a core and its data memory is related to FP computations. The adoption of FP formats requiring a lower number of bits is an interesting opportunity to reduce energy consumption, since it allows to simplify the arithmetic circuitry and to reduce the memory bandwidth required to transfer data between memory and registers by enabling vectorization. From a theoretical point of view, the adoption of multiple FP types perfectly fits with the principle of transprecision computing, allowing fine-grained control of approximation while meeting specified constraints on the precision of final results. In this paper we propose an extended FP type system with complete hardware support to enable transprecision computing on low-power embedded processors, including two standard formats (binary32 and binary16) and two new formats (binary8 and binary16alt). First, we introduce a software library that enables exploration of FP types by tuning both precision and dynamic range of program variables. Then, we present a methodology to integrate our library with an external tool for precision tuning, and experimental results that highlight the clear benefits of introducing the new formats. Finally, we present the design of a transprecision FP unit capable of handling 8-bit and 16-bit operations in addition to standard 32-bit operations. Experimental results on FP-intensive benchmarks show that up to 90% of FP operations can be safely scaled down to 8-bit or 16-bit formats. Thanks to precision tuning and vectorization, execution time is decreased by 12% and memory accesses are reduced by 27% on average, leading to a reduction of energy consumption up to 30%.
机译:在现代的低功耗嵌入式平台中,执行浮点(FP)操作成为动态范围较大的计算密集型应用程序能耗的主要贡献者。实验证据表明,内核及其数据存储器消耗的能量的50%与FP计算有关。要求较少位数的FP格式的采用是降低能耗的一个有趣机会,因为它可以简化算术电路,并通过启用矢量化来减少在存储器和寄存器之间传输数据所需的存储器带宽。从理论上讲,采用多种FP类型与超精密计算原理完全吻合,可以对近似值进行细粒度控制,同时满足对最终结果精度的特定限制。在本文中,我们提出了一个扩展的FP类型系统,该系统具有完整的硬件支持,可在低功耗嵌入式处理器上实现超精密计算,包括两种标准格式(binary32和binary16)和两种新格式(binary8和binary16alt)。首先,我们引入一个软件库,该软件库通过调整程序变量的精度和动态范围来探索FP类型。然后,我们提出了一种将库与外部工具集成以进行精确调整的方法,并通过实验结果突出了引入新格式的明显好处。最后,我们介绍了一种超精密FP单元的设计,除了标准的32位操作之外,该单元还可以处理8位和16位操作。 FP密集型基准测试的实验结果表明,高达90%的FP操作可以安全地缩减为8位或16位格式。得益于精确的调整和矢量化,执行时间平均减少了12%,内存访问平均减少了27%,从而使能耗降低了30%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号