首页> 外文期刊>Journal of Parallel and Distributed Computing >Parallel performance modeling of irregular applications in cell-centered finite volume methods over unstructured tetrahedral meshes
【24h】

Parallel performance modeling of irregular applications in cell-centered finite volume methods over unstructured tetrahedral meshes

机译:非结构四面体网格上以单元为中心的有限体积方法中不规则应用的并行性能建模

获取原文
获取原文并翻译 | 示例
           

摘要

Finite volume methods are widely used numerical strategies for solving partial differential equations. This paper aims at obtaining a quantitative understanding of the achievable performance of the cell-centered finite volume method on 3D unstructured tetrahedral meshes, using traditional multicore CPUs as well as modern GPUs. By using an optimized implementation and a synthetic connectivity matrix that exhibits a perfect structure of equal-sized blocks lying on the main diagonal, we can closely relate the achievable computing performance to the size of these diagonal blocks. Moreover, we have derived a theoretical model for identifying characteristic levels of the attainable performance as a function of hardware parameters, based on which a realistic upper limit of the performance can be predicted accurately. For real-world tetrahedral meshes, the key to high performance lies in a reordering of the tetrahedra, such that the resulting connectivity matrix resembles a block diagonal form where the optimal size of the blocks depends on the hardware. Numerical experiments confirm that the achieved performance is close to the practically attainable maximum and it reaches 75% of the theoretical upper limit, independent of the actual tetrahedral mesh considered. From this, we develop a general model capable of identifying bottleneck performance of a system's memory hierarchy in irregular applications.
机译:有限体积方法是解决偏微分方程广泛使用的数值策略。本文旨在使用传统的多核CPU和现代GPU,对3D非结构化四面体网格上以单元为中心的有限体积方法可实现的性能进行定量的理解。通过使用优化的实现方式和综合连通性矩阵,该矩阵展现出位于主对角线上相等大小块的完美结构,我们可以将可实现的计算性能与这些对角线块的大小紧密相关。此外,我们已经得出了一个理论模型,用于根据硬件参数确定可达到的性能的特征水平,在此基础上可以准确地预测实际的性能上限。对于现实世界中的四面体网格,高性能的关键在于对四面体的重新排序,以使生成的连接矩阵类似于块对角线形式,其中块的最佳大小取决于硬件。数值实验证实,所实现的性能接近实际可达到的最大值,并且达到理论上限的75%,与所考虑的实际四面体网格无关。据此,我们开发了一个通用模型,该模型能够识别不规则应用程序中系统内存层次结构的瓶颈性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号