首页> 外文会议>IEEE Annual International Symposium on Field-Programmable Custom Computing Machines >OpenCL-Based FPGA Design to Accelerate the Nodal Discontinuous Galerkin Method for Unstructured Meshes
【24h】

OpenCL-Based FPGA Design to Accelerate the Nodal Discontinuous Galerkin Method for Unstructured Meshes

机译:基于OpenCL的FPGA设计可加速非结构化网格的节点间断Galerkin方法

获取原文

摘要

The exploration of FPGAs as accelerators for scientific simulations has so far mostly been focused on small kernels of methods working on regular data structures, for example in the form of stencil computations for finite difference methods. In computational sciences, often more advanced methods are employed that promise better stability, convergence, locality and scaling. Unstructured meshes are shown to be more effective and more accurate, compared to regular grids, in representing computation domains of various shapes. Using unstructured meshes, the discontinuous Galerkin method preserves the ability to perform explicit local update operations for simulations in the time domain. In this work, we investigate FPGAs as target platform for an implementation of the nodal discontinuous Galerkin method to find time-domain solutions of Maxwell's equations in an unstructured mesh. When maximizing data reuse and fitting constant coefficients into suitably partitioned on-chip memory, high computational intensity allows us to implement and feed wide data paths with hundreds of floating point operators. By decoupling off-chip memory accesses from the computations, high memory bandwidth can be sustained, even for the irregular access pattern required by parts of the application. Using the Intel/Altera OpenCL SDK for FPGAs, we present different implementation variants for different polynomial orders of the method. In different phases of the algorithm, either computational or bandwidth limits of the Arria 10 platform are almost reached, thus outperforming a highly multithreaded CPU implementation by around 2x.
机译:迄今为止,对作为科学模拟加速器的FPGA的探索主要集中在处理常规数据结构的方法的小内核上,例如以有限差分方法的模板计算形式。在计算科学中,通常采用更先进的方法,以保证更好的稳定性,收敛性,局部性和缩放性。与常规网格相比,非结构化网格在表示各种形状的计算域方面显示出了更高的效率和准确性。使用非结构化网格,不连续的Galerkin方法保留了在时域中执行显式本地更新操作的能力。在这项工作中,我们将FPGA作为目标平台,以实现节点不连续Galerkin方法的实现,以在非结构化网格中找到Maxwell方程的时域解。当最大化数据重用并将常数系数拟合到适当划分的片上存储器中时,高的计算强度使我们能够利用数百个浮点运算符来实现和馈送宽数据路径。通过将片外存储器访问与计算解耦,即使对于应用程序某些部分所需的不规则访问模式,也可以维持较高的存储器带宽。使用用于FPGA的Intel / Altera OpenCL SDK,我们为方法的不同多项式阶数提供了不同的实现变体。在算法的不同阶段,几乎可以达到Arria 10平台的计算或带宽限制,因此比高度多线程的CPU实现高出大约2倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号