首页> 外文期刊>IEEE Transactions on Computers >Accelerating pipelined integer and floating-point accumulations in configurable hardware with delayed addition techniques
【24h】

Accelerating pipelined integer and floating-point accumulations in configurable hardware with delayed addition techniques

机译:使用延迟加法技术加速可配置硬件中的流水线整数和浮点累积

获取原文
获取原文并翻译 | 示例

摘要

The speed of arithmetic calculations in configurable hardware is limited by carry propagation, even with the dedicated hardware found in recent FPGAs. This paper proposes and evaluates an approach called delayed addition that reduces the carry-propagation bottleneck and improves the performance of arithmetic calculations. Our approach employs the idea used in Wallace trees to store the results in an intermediate form and delay addition until the end of a repeated calculation such as accumulation or dot-product; this effectively removes carry propagation overhead from the calculation's critical path. We present both integer and floating-point designs that use our technique. Our pipelined integer multiply-accumulate (MAC) design is based on a fairly traditional multiplier design, but with delayed addition as well. This design achieves a 72 MHz clock rate on an XC4036xla-9 FPGA and 170 MHz clock rate on an XV300epq240-8 FPGA. Next, we present a 32-bit floating-point accumulator based on delayed addition. Here, delayed addition requires a novel alignment technique that decouples the incoming operands from the accumulated result. A conservative version of this design achieves a 40 MHz clock rate on an XC4036xla-9 FPGA and 97 MHz clock rate on an XV100epq240-8 FPGA. We also present a 32-bit floating-point accumulator design with compiler-managed overflow avoidance that achieves a 80 MHz clock rate on an XC4036xla-9 FPGA and 150 MHz clock rate on an XCV100epq240-8 FPGA.
机译:可配置硬件中的算术计算速度受到进位传播的限制,即使使用最新的FPGA中的专用硬件也是如此。本文提出并评估了一种称为延迟加法的方法,该方法可减少进位传播瓶颈并提高算术计算的性能。我们的方法采用了Wallace树中使用的思想,以中间形式存储结果并延迟加法直到重复计算结束(例如累加或点积)。这有效地消除了计算关键路径的进位传播开销。我们介绍了使用我们的技术的整数和浮点设计。我们的流水线整数乘法累加(MAC)设计基于相当传统的乘法器设计,但也有延迟加法。该设计在XC4036xla-9 FPGA上达到72 MHz的时钟速率,在XV300epq240-8 FPGA上达到170 MHz的时钟速率。接下来,我们介绍一种基于延迟加法的32位浮点累加器。在这里,延迟加法需要一种新颖的对齐技术,该技术可以将传入的操作数与累积的结果解耦。此设计的保守版本在XC4036xla-9 FPGA上达到40 MHz时钟速率,在XV100epq240-8 FPGA上达到97 MHz时钟速率。我们还提出了具有编译器管理的溢出避免功能的32位浮点累加器设计,在XC4036xla-9 FPGA上达到80 MHz的时钟速率,在XCV100epq240-8 FPGA上达到150 MHz的时钟速率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号