...
首页> 外文期刊>Computers, IEEE Transactions on >Low-Cost Binary128 Floating-Point FMA Unit Design with SIMD Support
【24h】

Low-Cost Binary128 Floating-Point FMA Unit Design with SIMD Support

机译:具有SIMD支持的低成本Binary128浮点FMA单元设计

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Binary64 arithmetic is rapidly becoming inadequate to cope with today's large-scale computations due to an accumulation of errors. Therefore, binary128 arithmetic is now required to increase the accuracy and reliability of these computations. At the same time, an obvious trend emerging in modern processors is to extend their instruction sets by allowing single instruction multiple data (SIMD) execution, which can significantly accelerate the data-parallel applications. To address the combined demands mentioned above, this paper presents the architecture of a low-cost binary128 floating-point fused multiply add (FMA) unit with SIMD support. The proposed FMA design can execute a binary128 FMA every other cycle with a latency of four cycles, or two binary64 FMAs fully pipelined with a latency of three cycles, or four binary32 FMAs fully pipelined with a latency of three cycles. We use two binary64 FMA units to support binary128 FMA which requires much less hardware than a fully pipelined binary128 FMA. The presented binary128 FMA design uses both segmentation and iteration hardware vectorization methods to trade off performance, such as throughput and latency, against area and power. Compared with a standard binary128 FMA implementation, the proposed FMA design has 30 percent less area and 29 percent less dynamic power dissipation.
机译:由于错误的累积,Binary64算术正迅速变得不足以应付当今的大规模计算。因此,现在需要使用binary128算术来提高这些计算的准确性和可靠性。同时,现代处理器中出现的一个明显趋势是通过允许单指令多数据(SIMD)执行来扩展其指令集,从而可以显着加速数据并行应用程序。为了解决上述综合需求,本文提出了一种具有SIMD支持的低成本二进制128浮点融合乘法加法(FMA)单元的体系结构。所提出的FMA设计可以每隔一个周期以四个周期的延迟执行binary128 FMA,或者以三个周期的延迟完全流水线地执行两个Binary64 FMA,或者以三个周期的延迟完全流水线的四个Binary32 FMA。我们使用两个binary64 FMA单元来支持binary128 FMA,与完全流水线的binary128 FMA相比,它所需的硬件要少得多。提出的binary128 FMA设计同时使用分段和迭代硬件矢量化方法来权衡性能(例如吞吐量和延迟)与面积和功耗之间的关系。与标准的binary128 FMA实现相比,建议的FMA设计的面积减少了30%,动态功耗降低了29%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号