首页> 外文期刊>Journal of Parallel and Distributed Computing >FPGA-based tsunami simulation: Performance comparison with GPUs, and roofline model for scalability analysis
【24h】

FPGA-based tsunami simulation: Performance comparison with GPUs, and roofline model for scalability analysis

机译:基于FPGA的海啸仿真:与GPU的性能比较以及用于可扩展性分析的Roofline模型

获取原文
获取原文并翻译 | 示例

摘要

MOST (Method Of Splitting Tsunami) is widely used to solve shallow water equations (SWEs) for simulation of tsunami. This paper presents high-performance and power-efficient computation of MOST for practical tsunami simulation with FPGA. The custom hardware for simulation is based on a stream computing architecture for deeply pipelining to increase performance with a limited bandwidth. We design a stream processing element (SPE) of computing kernels combined with stencil buffers. We also introduce an SPE array architecture with spatial and temporal parallelism to further exploit available hardware resources by implementing multiple SPEs with parallel internal pipelines. Our prototype implementation with Arria 10 FPGA demonstrates that the FPGA-based design performs numerically stable tsunami simulation with real ocean-depth data in single precision by introducing non-dimensionalization. We explore the design space of SPE arrays, and find that the design of six cascaded SPEs with a single pipeline achieves the sustained performance of 383 GFlops and the performance per power of 8.41 GFlops/W with a stream bandwidth of only 7.2 GB/s. These numbers are 8.6 and 17.2 times higher than those of NVidia Tesla K20c GPU, and 1.7 and 7.1 times higher than those of AMD Radeon R9 280X GPU, respectively, for the same tsunami simulation in single precision. Moreover, we proposed a roofline model for stream computing with the SPE array in order to investigate factors of performance degradation and possible performance improvement for given FPGAs. With the model, we estimate that an upcoming Stratix 10 GX2800 FPGA can achieve the sustained performance of 8.7 TFlops at most with our SPE array architecture for tsunami simulation.
机译:MOST(分裂海啸的方法)被广泛用于求解模拟海啸的浅水方程(SWE)。本文介绍了用于FPGA的实际海啸模拟的MOST高性能和高能效计算。用于仿真的定制硬件基于流计算架构,用于深度流水线化以在有限的带宽下提高性能。我们设计与模板缓冲区结合的计算内核的流处理元素(SPE)。我们还介绍了具有空间和时间并行性的SPE阵列体系结构,以通过使用并行内部管道实现多个SPE来进一步利用可用的硬件资源。我们在Arria 10 FPGA上的原型实现证明,基于FPGA的设计通过引入无量纲化,可以以单精度对真实海深数据执行数值稳定的海啸模拟。我们探索了SPE阵列的设计空间,发现通过一条流水线设计六个级联SPE可以实现383 GFlops的持续性能和8.41 GFlops / W的每功率性能,而流带宽仅为7.2 GB / s。对于单精度的相同海啸模拟,这些数字分别比NVidia Tesla K20c GPU高8.6和17.2倍,分别比AMD Radeon R9 280X GPU高1.7和7.1倍。此外,我们提出了一个屋顶线模型,用于使用SPE阵列进行流计算,以研究给定FPGA的性能下降和可能的性能改善的因素。利用该模型,我们估计采用我们用于海啸模拟的SPE阵列架构,即将推出的Stratix 10 GX2800 FPGA最多可以实现8.7 TFlops的持续性能。

著录项

  • 来源
    《Journal of Parallel and Distributed Computing》 |2017年第8期|153-169|共17页
  • 作者单位

    Graduate School of Information Sciences, Tohoku University, 6-6-01 Aramaki-aza Aoba, Aoba, Sendai, Miyagi 980-8579, Japan;

    Graduate School of Information Sciences, Tohoku University, 6-6-01 Aramaki-aza Aoba, Aoba, Sendai, Miyagi 980-8579, Japan;

    School of Computer Science and Engineering, The University of Aizu, lkki-machi Tsuruga, Aizuwakamatsu, Fukushima 965-8580, Japan;

    School of Computer Science and Engineering, The University of Aizu, lkki-machi Tsuruga, Aizuwakamatsu, Fukushima 965-8580, Japan;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Tsunami simulation; Stream computing; Custom hardware; FPGA; GPU; Roofline model;

    机译:海啸模拟;流计算;定制硬件;FPGA;GPU;车顶线模型;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号