GPU Acceleration of Runge-Kutta Integrators

Murray Lawrence

首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >GPU Acceleration of Runge-Kutta Integrators

【24h】

GPU Acceleration of Runge-Kutta Integrators

机译：Runge-Kutta集成商的GPU加速

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider the use of commodity graphics processing units (GPUs) for the common task of numerically integrating ordinary differential equations (ODEs), achieving speedups of up to 115-fold over comparable serial CPU implementations, and 15-fold over multithreaded CPU code with SIMD intrinsics. Using Lorenz '96 models as a case study, single and double precision benchmarks are established for both the widely used DOPRI5 method and computationally tailored low-storage {rm RK}4(3)5[2{rm R}+]{rm C}. A range of configurations are assessed on each, including multithreading and SIMD intrinsics on the CPU, and GPU kernels parallelized over both the dimensionality of the ODE system and number of trajectories. On the GPU, we draw particular attention to the problem of variable task-length among threads of the same warp, proposing a lightweight strategy of assigning multiple data items to each thread to reduce the prevalence of redundant operations. A simple analysis suggests that the strategy can draw performance close to that of ideal parallelism, while empirical results demonstrate up to a 10 percent improvement over the standard approach.

机译：我们考虑使用商品图形处理单元（GPU）来完成对常微分方程（ODE）进行数值积分的常见任务，与类似的串行CPU实现相比，可实现高达115倍的加速，与SIMD相比，多线程CPU代码可实现15倍的加速本质。使用Lorenz '96模型作为案例研究，为广泛使用的DOPRI5方法和计算定制的低存储{rm RK} 4（3）5 [2 {rm R} +] {rm C建立了单精度和双精度基准}。每个配置都会评估一系列配置，包括CPU上的多线程和SIMD内在函数，以及在ODE系统的维数和轨迹数上都并行化的GPU内核。在GPU上，我们特别注意相同扭曲线程之间任务长度可变的问题，提出了为每个线程分配多个数据项的轻量级策略，以减少冗余操作的普遍性。一个简单的分析表明，该策略可以使性能接近理想的并行性，而经验结果表明，该方法比标准方法提高了10％。

著录项

来源
《Parallel and Distributed Systems, IEEE Transactions on 》 |2012年第1期| p.94-101| 共8页
作者
Murray Lawrence;
展开▼
作者单位

CSIRO Mathematics, Wembley;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
GPGPU.; Ordinary differential equations; Runge-Kutta integration; graphics hardware; initial value problems;

机译：GPGPU;常微分方程;Runge-Kutta积分;图形硬件;初值问题;

相似文献

外文文献
中文文献
专利

1. ACCELERATION OF RUNGE-KUTTA INTEGRATION SCHEMES [J] . PHAILAUNG PHOHOMSIRI, FIRDAUS E. UDWADIA Discrete dynamics in nature and society . 2004 ,第2期

机译：RUNGE-KUTTA集成方案的加速
2. ACCELERATION OF RUNGE-KUTTA INTEGRATION SCHEMES [J] . PHAILAUNG PHOHOMSIRI, FIRDAUS E. UDWADIA Discrete dynamics in nature and society . 2004 ,第2期

机译：RUNGE-KUTTA集成方案的加速
3. Acceleration of Runge-Kutta integration schemes [J] . PhailaungPhohomsiri, Firdaus E.Udwadia Discrete dynamics in nature and society . 2004 ,第2期

机译：加速Runge-Kutta集成方案
4. Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration [C] . Cong Guo, Yangjie Zhou, Jingwen Leng, ACM/IEEE Design Automation Conference . 2020

机译：通过临时GPU-脉动阵列集成来平衡DNN加速的效率和灵活性
5. Method of Moments Modeling of Single Layer Microstrip Patch Antennas using GPU Acceleration and Quasi-Monte Carlo Integration. [D] . Cerjanic, Alexander M. 2012

机译：使用GPU加速和拟蒙特卡洛积分的单层微带贴片天线矩建模方法。
6. DNA sequences alignment in multi-GPUs: acceleration and energy payoff [O] . Jesús Pérez-Serrano, Edans Sandes, Alba Cristina Magalhaes Alves de Melo, 2018

机译：多GPU中的DNA序列比对：加速和能量释放
7. A GPU-based Transient Stability Simulation using Runge-Kutta Integration Algorithm [O] . Qin Z, Hou Y 2013

机译：基于GpU的Runge-Kutta积分算法的暂态稳定性仿真

GPU Acceleration of Runge-Kutta Integrators

摘要

著录项

相似文献

相关主题

期刊订阅