首页> 外文期刊>Journal of computational science >Graph-based multi-core higher-order time integration of linear autonomous partial differential equations
【24h】

Graph-based multi-core higher-order time integration of linear autonomous partial differential equations

机译:基于图的线性自主部分微分方程的基于图的多核高阶时间集成

获取原文
获取原文并翻译 | 示例
           

摘要

Modern high-performance computing (HPC) systems rely on increasingly complex nodes with a steadily growing number of cores and matching deep memory hierarchies. In order to fully exploit them, algorithms must be explicitly designed to exploit these features. In this work we address this challenge for a widely used class of application kernels: polynomial-based time integration of linear autonomous partial differential equations.& nbsp; We build on prior work [1] of a cache-aware, yet sequential solution and provide an innovative way to parallelize it, while addressing cache-awareness across a large number of cores. For this, we introduce a dependency graph driven view of the algorithm and then use both static graph partitioning and dynamic scheduling to efficiently map the execution to the underlying platform. We implement our approach on top of the widely available Intel Threading Building Blocks (TBB) library, although the concepts are programming model agnostic and can apply to any task-driven parallel programming approach.& nbsp; We demonstrate the performance of our approach for a 2nd, 4th and 6th order time integration of the linear advection equation on three different architectures with widely varying memory systems and achieve an up to 60% reduction of wall clock time compared to a conventional, state-of-the-art non-cache-aware approach.
机译:现代高性能计算(HPC)系统依靠越来越复杂的节点,稳定地越来越多地核心核心和匹配深记忆层次结构。为了充分利用它们,必须明确地设计算法来利用这些功能。在这项工作中,我们为广泛使用的应用程序内核提供了此挑战:线性自主部分微分方程的基于多项式的时间集成。 我们在高速缓存感知,且顺序解决方案的先前工作[1]上,提供了一种并行化的创新方法,同时解决大量核心的高速缓存意识。为此,我们引入了算法的依赖性图形驱动视图,然后使用静态图形分区和动态调度来有效地将执行映射到底层平台。我们在广泛可用的英特尔线程构建块(TBB)库的顶部实施了我们的方法,尽管概念是编程模型不可知的,可以应用于任何任务驱动的并行编程方法。 我们展示了我们对三个不同架构的第2,第4和第6顺序时间集成的方法的性能,与传统的状态相比,在具有广泛变化的内存系统上的三种不同架构上的三种不同架构上的三个不同架构中的壁钟时间减少了高达60%的时间 - 最艺术的非缓存感知方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号