首页> 外文会议>International Conference on High Performance Computing and Simulation >Performance Analysis of SIMD Vectorization of High-Order Finite-Element Kernels
【24h】

Performance Analysis of SIMD Vectorization of High-Order Finite-Element Kernels

机译:高阶有限元核的SIMD矢量化性能分析

获取原文

摘要

Physics-based three-dimensional numerical simulations are becoming more predictive and are already essential for improving the understanding of natural phenomena, such as earthquakes, tsunami, flooding or climate change and global warming. Among the numerical methods available to support these simulations, Finite-Element formulations have been implemented in several major software packages. The efficiency of these algorithms remains a challenge due to the irregular memory access that prevents the squeezing out of the maximum level of performance out of current architectures. This is particularly true at the shared-memory level with several levels of parallelism and complex memory hierarchies. Despite significant efforts, automatic optimizations provided by compilers and high-level frameworks are often far from the performances obtained from hand-tuned implementations. In this paper, we have extracted a kernel from the EFISPEC software package developed at BRGM (the French Geological Survey). This application implements a high-order finite-element method to solve the elastodynamic equation. We characterize the performance of the extracted mini-app considering key parameters such as the order of the approximation, the memory access pattern or the vector length. Based on this study, we detail specific optimizations and we discuss the results measured as regards to the roofline performance model on Intel Broadwell and Skylake architectures.
机译:基于物理的三维数值模拟变得更加预测,并且对于改善对自然现象的理解,如地震,海啸,洪水或气候变化以及全球变暖,这已经是必不可少的。在支持这些模拟的数值方法中,在几个主要软件包中已经实施了有限元制剂。由于不规则的内存访问,这些算法的效率仍然是一个挑战,这阻止了在当前架构中挤出最大性能水平的挤出。这在具有多个级别的并行度和复杂的内存层次结构中的共享内存级别尤其如此。尽管有重大努力,编译器和高级框架提供的自动优化往往远远远非从手工调整实现中获得的性能。在本文中,我们从BRGM(法国地质调查)开发的EFISPEC软件包中提取了内核。该应用实现了一种高阶有限元方法来解决弹性动力学方程。考虑诸如近似的关键参数,存储器访问模式或向量长度,表征提取的Mini-app的性能。基于这项研究,我们详细介绍了特定的优化,我们讨论了英特尔Broadwell和Skylake架构的屋顶性能模型中测量的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号