...
首页> 外文期刊>Journal of Parallel and Distributed Computing >Online auto-tuning for the time-step-based parallel solution of ODEs on shared-memory systems
【24h】

Online auto-tuning for the time-step-based parallel solution of ODEs on shared-memory systems

机译:在线自动调整共享内存系统上基于时间的ODE并行解决方案

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

This article considers automatic performance tuning of time-step-based parallel solution methods for initial value problems (IVPs) of systems of ordinary differential equations (ODEs). We apply auto-tuning to the parallel execution of a class of explicit predictor-corrector (PC) methods of Runge-Kutta (RK) type on shared-memory architectures. The performance of parallel multi-threaded implementation variants of these methods depends on various factors only known at runtime, for example, the coupling structure of the ODE system to be solved, the memory access pattern resulting from this coupling structure, and the number of threads executing the program. We propose an online auto-tuning approach that exploits the time-stepping nature of ODE methods by selecting the best parallel implementation variant from a set of candidate implementations at runtime during the first time steps. Thus, the auto-tuning process is not isolated from the computation, but rather contributes to the progress of the solution process. The search space of candidate implementations is a priori reduced by estimating the synchronization overhead of each implementation variant. For implementation variants containing tiled loops, suitable tile sizes are selected using a heuristic empirical search guided by an analytical model. Runtime experiments with two different test problems show the efficiency of the online auto-tuning approach on two different shared-memory systems equipped with 48 and 1040 cores.
机译:本文考虑针对常微分方程(ODE)系统的初始值问题(IVP)的基于时间步的并行求解方法的自动性能调整。我们将自动调整应用于共享内存体系结构上一类Runge-Kutta(RK)类型的显式预测器-校正器(PC)方法的并行执行。这些方法的并行多线程实现变体的性能取决于仅在运行时才知道的各种因素,例如,要解决的ODE系统的耦合结构,由此耦合结构产生的内存访问模式以及线程数执行程序。我们提出了一种在线自动调整方法,该方法通过在运行时的第一步中从一组候选实现中选择最佳的并行实现变体来利用ODE方法的时间步长特性。因此,自动调整过程并非与计算隔离,而是有助于求解过程的进行。通过估计每个实现变体的同步开销,可以预先减少候选实现的搜索空间。对于包含平铺循环的实现变体,使用由解析模型指导的启发式经验搜索来选择合适的平铺大小。带有两个不同测试问题的运行时实验表明,在线自动调整方法在配备有48个和1040个内核的两个不同的共享内存系统上的效率很高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号