首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >FPGA-Based Scalable and Power-Efficient Fluid Simulation using Floating-Point DSP Blocks
【24h】

FPGA-Based Scalable and Power-Efficient Fluid Simulation using Floating-Point DSP Blocks

机译:使用浮点DSP模块的基于FPGA的可扩展且高效节能的流体仿真

获取原文
获取原文并翻译 | 示例

摘要

High-performance and low-power computation is required for large-scale fluid dynamics simulation. Due to the inefficient architecture and structure of CPUs and GPUs, they now have a difficulty in improving power efficiency for the target application. Although FPGAs become promising alternatives for power-efficient and high-performance computation due to their new architecture having floating-point (FP) DSP blocks, their relatively narrow memory bandwidth requires an appropriate way to fully exploit the advantage. This paper presents an architecture and design for scalable fluid simulation based on data-flow computing with a state-of-the-art FPGA. To exploit available hardware resources including FP DSPs, we introduce spatial and temporal parallelism to further scale the performance by adding more stream processing elements (SPEs) in an array. Performance modeling and prototype implementation allow us to explore the design space for both the existing Altera Arria10 and the upcoming Intel Stratix10 FPGAs. We demonstrate that Arria10 10AX115 FPGA achieves 519 GFlops at 9.67 GFlops/W only with a stream bandwidth of 9.0 GB/s, which is 97.9 percent of the peak performance of 18 implemented SPEs. We also estimate that Stratix10 FPGA can scale up to 6844 GFlops by combining spatial and temporal parallelism adequately.
机译:大规模流体动力学仿真需要高性能和低功耗的计算。由于CPU和GPU的体系结构效率低下,它们现在难以提高目标应用程序的电源效率。尽管FPGA由于其具有浮点(FP)DSP块的新架构而成为节能和高性能计算的有前途的替代方案,但其相对较窄的存储器带宽需要适当的方法来充分利用这一优势。本文介绍了一种基于FPGA的基于数据流计算的可扩展流体仿真的体系结构和设计。为了利用包括FP DSP在内的可用硬件资源,我们引入了空间和时间并行性,以通过在阵列中添加更多的流处理元素(SPE)来进一步扩展性能。性能建模和原型实现使我们能够探索现有Altera Arria10和即将推出的Intel Stratix10 FPGA的设计空间。我们展示了Arria10 10AX115 FPGA仅以9.0 GB / s的流带宽实现了9.67 GFlops / W的519 GFlops,这是18个已实现SPE的峰值性能的97.9%。我们还估计,通过充分结合空间和时间并行性,Stratix10 FPGA可以扩展至6844 GFlops。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号