【24h】

Techniques for solving stiff chemical kinetics on GPUs

机译:解决GPU上严格的化学动力学的技术

获取原文

摘要

An implicit and explicit integration algorithm were ported to CUDA graphical processing units (GPUs) and used to concurrently solve sets of independent ordinary differential equations (ODEs) arising from finite-rate chemical kinetics. The GPU-enabled 4th-order accurate adaptive Runge-Kutta-Fehlberg (RKF45) ODE solver achieved a maximum speedup of 20.2x over the baseline implicit 5th-order accurate DVODE CPU run-time for larger numbers of ODEs with comparable solution accuracy. The GPU implementation of the DVODE solver achieved a maximum speed-up of 7.7x over the baseline CPU run-time. The performance impact of mapping one thread to each ODE was compared to mapping an entire CUDA thread-block per ODE (i.e., multiple threads per ODE). The one-thread-per-ODE approach achieved greater overall speed-up compared to the one-block-per-ODE approach but only when the number of ODEs was large: 1,000 ODEs were needed just to break even with the scalar CPU version and over 50,000 ODEs to reach maximum parallel efficiency. The performance difference is most pronounced with the RKF45 algorithm. The peak performance with the one-thread-per-ODE method was nearly 2x faster than the one-block-per-ODE approach. The one-block-per-problem implementation of RKF45 and DVODE both achieved lower peak speed-ups but outperformed the scalar CPU performance with as few as 100 ODEs. The new GPU-enabled ODE solvers demonstrate a method to significantly reduce the computational cost of detailed finite-rate combustion simulations with turn-around cost savings exceeding an order of magnitude.
机译:隐式和显式集成算法已移植到CUDA图形处理单元(GPU),并用于同时求解由有限速率化学动力学引起的一组独立的常微分方程(ODE)。具有GPU功能的4阶精确自适应Runge-Kutta-Fehlberg(RKF45)ODE求解器在具有相当的解决方案精度的情况下,对于大量的ODE而言,相对于基线隐含5阶精确DVODE CPU运行时间,其最大加速比达到20.2倍。在基准CPU运行时间上,DVODE求解器的GPU实现达到了7.7倍的最大加速。将一个线程映射到每个ODE的性能影响与每个ODE映射整个CUDA线程块(即每个ODE多个线程)的影响进行了比较。与每个ODE块一个方法相比,每个ODE一线程方法实现了更高的整体速度,但是只有在ODE数量很大的情况下:为了使用标量CPU版本达到收支平衡,就需要1,000个ODE。超过50,000个ODE,以达到最大并行效率。性能差异在RKF45算法中最为明显。每ODE一线程方法的最高性能比每ODE一块方法快2倍。 RKF45和DVODE的每个问题一个块的实现都实现了较低的峰值加速,但仅使用100个ODE就超过了标量CPU性能。新的支持GPU的ODE求解器演示了一种方法,该方法可显着降低详细的有限速率燃烧模拟的计算成本,并且节省的周转成本超过一个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号