Vectorizing unstructured mesh computations for many-core architectures

Reguly I Z.; László Endre; Mudalige Gihan R.; Giles Mike B.

首页> 外文期刊>Concurrency and computation: practice and experience >Vectorizing unstructured mesh computations for many-core architectures

【24h】

Vectorizing unstructured mesh computations for many-core architectures

机译：向量化多核架构的非结构化网格计算

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Achieving optimal performance on the latest multi-core and many-core architectures increasingly depends on making efficient use of the hardware's vector units. This paper presents results on achieving high performance through vectorization on CPUs and the Xeon-Phi on a key class of irregular applications: unstructured mesh computations. Using single instruction multiple thread (SIMT) and single instruction multiple data (SIMD) programming models, we show how unstructured mesh computations map to OpenCL or vector intrinsics through the use of code generation techniques in the OP2 Domain Specific Library and explore how irregular memory accesses and race conditions can be organized on different hardware. We benchmark Intel Xeon CPUs and the Xeon-Phi, using a tsunami simulation and a representative CFD benchmark. Results are compared with previous work on CPUs and NVIDIA GPUs to provide a comparison of achievable performance on current many-core systems. We show that auto-vectorization and the OpenCL SIMT model do not map efficiently to CPU vector units because of vectorization issues and threading overheads. In contrast, using SIMD vector intrinsics imposes some restrictions and requires more involved programming techniques but results in efficient code and near-optimal performance, two times faster than non-vectorized code. We observe that the Xeon-Phi does not provide good performance for these applications but is still comparable with a pair of mid-range Xeon chips. Copyright © 2015 John Wiley & Sons, Ltd.

机译：在最新的多核和多核体系结构上实现最佳性能越来越取决于有效利用硬件的矢量单元。本文介绍了在非常规应用的关键类别：非结构化网格计算上，通过在CPU上进行矢量化和Xeon-Phi实现高性能的结果。使用单指令多线程（SIMT）和单指令多数据（SIMD）编程模型，我们展示了如何通过使用OP2域特定库中的代码生成技术，将非结构化网格计算映射到OpenCL或矢量内在函数，并探索不规则存储器访问的方式比赛条件可以在不同的硬件上组织。我们使用海啸模拟和具有代表性的CFD基准测试对英特尔至强CPU和至强融核进行基准测试。将结果与先前在CPU和NVIDIA GPU上的工作进行比较，以比较当前多核系统上可实现的性能。我们显示，由于矢量化问题和线程开销，自动矢量化和OpenCL SIMT模型不能有效地映射到CPU矢量单元。相比之下，使用SIMD向量内在函数会施加一些限制，并且需要更多的编程技术，但会产生有效的代码和接近最佳的性能，比未向量化的代码快两倍。我们观察到Xeon-Phi不能为这些应用提供良好的性能，但仍可以与一对中端Xeon芯片相媲美。版权所有©2015 John Wiley＆Sons，Ltd.

著录项

来源
《Concurrency and computation: practice and experience》 |2016年第2期|557-577|共21页
作者
Reguly I Z.; László Endre; Mudalige Gihan R.; Giles Mike B.;
展开▼
作者单位

University of Oxford Oxford e‐Research Centre Oxford UK;

Pázmány Péter Catholic University Faculty of Information Technology and Bionics Budapest Hungary;

University of Oxford Oxford e‐Research Centre Oxford UK;

Pázmány Péter Catholic University Faculty of Information Technology and Bionics Budapest Hungary;

University of Oxford Oxford e‐Research Centre Oxford UK;

University of Oxford Oxford e‐Research Centre Oxford UK;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
vectorization; Xeon Phi; AVX; CUDA; unstructured grid; programming abstraction;

机译：向量化;至强融核;AVX;CUDA;非结构化网格;编程抽象;

相似文献

外文文献
中文文献
专利

1. Performance analysis of a 3D unstructured mesh hydrodynamics code on multi-core and many-core architectures [J] . Waltz J., Wohlbier J. G., Risinger L. D., International Journal for Numerical Methods in Fluids . 2015,第6期

机译：多核和多核体系结构上的3D非结构化网格流体力学代码的性能分析
2. Architectural Support for Cilk Computations on Many-core Architectures [J] . Guoping Long, Dongrui Fan, Junchao Zhang ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2009,第4期

机译：对多核架构上的民间计算的架构支持
3. Mesh Generation Using Unstructured Computational Meshes and Elliptic Partial Differential Equation Smoothing [J] . Steve L. Karman Jr., W. Kyle Anderson, Mandar Sahasrabudhe AIAA Journal . 2006,第6期

机译：使用非结构化计算网格和椭圆形偏微分方程平滑的网格生成
4. DEVELOPING A MINI-APP FOR EXPLORING ALGORITHMS FOR UNSTRUCTURED MESH DETERMINISTIC DISCRETE ORDINATES TRANSPORT ON MANY-CORE ARCHITECTURES [C] . Tom Deakin, Simon McIntosh-Smith, Justin Lovegrove, International Topical Meeting on Nuclear Reactor Thermal Hydraulics . 2019

机译：开发用于探索非结构化网格确定性离散的算法的迷你应用程序在许多核心架构上运输
5. Computational Continua for Heterogeneous Solids: Studies on Unstructured Finite Element Meshes and on Wave Propagation [D] . Fafalis, Dimitrios. 2017

机译：非均质固体的计算连续体：非结构化有限元网格和波传播的研究
6. High-Performance 3D Compressive Sensing MRI Reconstruction Using Many-Core Architectures [O] . Daehyun Kim, Joshua Trzasko, Mikhail Smelyanskiy, 2011

机译：使用多核架构的高性能3D压缩传感MRI重建
7. Vectorizing unstructured mesh computations for many-core architectures. [O] . Reguly I Z., László Endre, Mudalige Gihan R., 2016

机译：对多核架构的向量化非结构化网格计算。
8. Computational results for parallel unstructured mesh computations [R] . Jones, M. T. , Plassmann, P. E. 1994

机译：并行非结构化网格计算的计算结果

Vectorizing unstructured mesh computations for many-core architectures

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅