首页> 外文会议>International Workshop on Reconfigurable Computing >A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation
【24h】

A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation

机译:基于高吞吐量FPGA的浮点共轭梯度实现

获取原文

摘要

As Field Programmable Gate Arrays (FPGAs) have reached capacities beyond millions of equivalent gates, it becomes possible to accelerate floating-point scientific computing applications. One type of calculation that is commonplace in scientific computation is the solution of systems of linear equations. A method that has proven in software to be very efficient and robust for finding such solutions is the Conjugate Gradient algorithm. In this paper we present a parallel hardware Conjugate Gradient implementation. The implementation is particularly suited for accelerating multiple small to medium sized dense systems of linear equations. Through parallelization it is possible to convert the computation time per iteration for an order n matrix from Θ(n{sup}2) cycles for a software implementation to Θ(n). I/O requirements are scalable and converge to a constant value with the increase of matrix order. Results on a VirtexII-6000 demonstrate sustained performance of 5 GFLOPS and projected results on a Virtex5-330 indicate sustained performance of 35 GFLOPS. The former result is comparable to high-end CPUs, whereas the latter represents a significant speedup.
机译:作为现场可编程门阵列(FPGA)已达到超过数百万等效门的容量,因此可以加速浮点科学计算应用。科学计算中普遍的一种计算是线性方程系统的解决方案。一种在软件中被证明是非常有效和稳健寻找此类解决方案的方法是共轭梯度算法。在本文中,我们呈现了一个并行硬件共轭梯度实现。实施特别适用于加速多个小于中小型的线性方程的中等大小密集系统。通过并行化,可以将来自θ(n {sup} 2)周期的命令n矩阵的迭代的计算时间转换为θ(n)的软件实现。随着矩阵顺序的增加,I / O要求是可扩展的,并收敛到恒定值。 Virtexii-6000上的结果表明5 GFLOPS的持续性能,并在Virtex5-330上投影结果表明35 GFLOPS的持续性能。前一个结果与高端CPU相当,而后者代表了显着的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号