Accelerating pipelined integer and floating-point accumulations in configurable hardware with delayed addition techniques

Luo Z.; Martonosi M.

首页> 外文期刊>IEEE Transactions on Computers >Accelerating pipelined integer and floating-point accumulations in configurable hardware with delayed addition techniques

【24h】

Accelerating pipelined integer and floating-point accumulations in configurable hardware with delayed addition techniques

机译：使用延迟加法技术加速可配置硬件中的流水线整数和浮点累积

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The speed of arithmetic calculations in configurable hardware is limited by carry propagation, even with the dedicated hardware found in recent FPGAs. This paper proposes and evaluates an approach called delayed addition that reduces the carry-propagation bottleneck and improves the performance of arithmetic calculations. Our approach employs the idea used in Wallace trees to store the results in an intermediate form and delay addition until the end of a repeated calculation such as accumulation or dot-product; this effectively removes carry propagation overhead from the calculation's critical path. We present both integer and floating-point designs that use our technique. Our pipelined integer multiply-accumulate (MAC) design is based on a fairly traditional multiplier design, but with delayed addition as well. This design achieves a 72 MHz clock rate on an XC4036xla-9 FPGA and 170 MHz clock rate on an XV300epq240-8 FPGA. Next, we present a 32-bit floating-point accumulator based on delayed addition. Here, delayed addition requires a novel alignment technique that decouples the incoming operands from the accumulated result. A conservative version of this design achieves a 40 MHz clock rate on an XC4036xla-9 FPGA and 97 MHz clock rate on an XV100epq240-8 FPGA. We also present a 32-bit floating-point accumulator design with compiler-managed overflow avoidance that achieves a 80 MHz clock rate on an XC4036xla-9 FPGA and 150 MHz clock rate on an XCV100epq240-8 FPGA.

机译：可配置硬件中的算术计算速度受到进位传播的限制，即使使用最新的FPGA中的专用硬件也是如此。本文提出并评估了一种称为延迟加法的方法，该方法可减少进位传播瓶颈并提高算术计算的性能。我们的方法采用了Wallace树中使用的思想，以中间形式存储结果并延迟加法直到重复计算结束（例如累加或点积）。这有效地消除了计算关键路径的进位传播开销。我们介绍了使用我们的技术的整数和浮点设计。我们的流水线整数乘法累加（MAC）设计基于相当传统的乘法器设计，但也有延迟加法。该设计在XC4036xla-9 FPGA上达到72 MHz的时钟速率，在XV300epq240-8 FPGA上达到170 MHz的时钟速率。接下来，我们介绍一种基于延迟加法的32位浮点累加器。在这里，延迟加法需要一种新颖的对齐技术，该技术可以将传入的操作数与累积的结果解耦。此设计的保守版本在XC4036xla-9 FPGA上达到40 MHz时钟速率，在XV100epq240-8 FPGA上达到97 MHz时钟速率。我们还提出了具有编译器管理的溢出避免功能的32位浮点累加器设计，在XC4036xla-9 FPGA上达到80 MHz的时钟速率，在XCV100epq240-8 FPGA上达到150 MHz的时钟速率。

著录项

来源
《IEEE Transactions on Computers》 |2000年第3期|P.208-218|共11页
作者
Luo Z.; Martonosi M.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. A Configurable Floating-Point Discrete Hilbert Transform Processor for Accelerating the Calculation of Filter in Katsevich Formula [J] . WANG XU, ZHANG YAN, WANG FEI, WSEAS Transactions on Communications . 2012,第10a12期

机译：一种可配置的浮点离散希尔伯特变换处理器，可加快Katsevich公式中的滤波器计算
2. Hardware Designs for Decimal Floating-Point Addition and Related Operations [J] . Wang Liang-Kai, Schulte Michael J., Thompson John D., IEEE Transactions on Computers . 2009,第3期

机译：十进制浮点加法和相关操作的硬件设计
3. Delay-optimized implementation of IEEE floating-point addition [J] . Seidel P.-M., Even G. IEEE Transactions on Computers . 2004,第2期

机译：延迟优化的IEEE浮点加法实现
4. Use of delayed addition techniques to accelerate integer and floating-point calculations in configurable hardware [C] . Zhen Luo, Princeton Univ., Princeton, Configurable Computing: Technology and Applications . 1998

机译：使用延迟加法技术可加速可配置硬件中的整数和浮点计算
5. A re-configurable pipeline ADC architecture with built-in self-test techniques. [D] . Liu, Hui. 2001

机译：具有内置自测技术的可重配置管道ADC架构。
6. Design of Hardware Accelerators with Configurable Pipeline [O] . Kaur Gurveer 2016

机译：可配置管道的硬件加速器设计

Accelerating pipelined integer and floating-point accumulations in configurable hardware with delayed addition techniques

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅