Exploiting GPU memory hierarchy for accelerating a specialized stencil computation

Thanasekhar Balaiah; Ranjani Parthasarathi

首页> 外文期刊>Concurrency, practice and experience >Exploiting GPU memory hierarchy for accelerating a specialized stencil computation

【24h】

Exploiting GPU memory hierarchy for accelerating a specialized stencil computation

机译：利用GPU内存层次结构来加速专业的模具计算

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Stencil computations are an important class of problems that can benefit from graphics processing units (GPUs). However, given the hierarchical and on‐chip blocked memory organization in GPUs, the memory performance degrades for specific data access patterns in stencils. Hence, we need appropriate data layout to effectively use the different levels of the memory to harvest the full potential of GPUs. In this context, a specialized stencil computation problem, namely, Lattice Boltzmann Method, which has a complex neighborhood relationship along with loop carried dependence, is considered as a strong case study. Four different approaches for the lattice Boltzmann method have been developed in this work by exploiting memory hierarchy with new data layouts and kernel organizations. These methods have been developed with the primary aim of increasing the compute to global memory access ratio and reducing the overall read‐write latency, even at the expense of additional computations. NVIDIA GPUs TitanX, GTX 960, GTX 740Ti, and GTX 650Ti have been used to test the proposed techniques. The compute to global memory access ratio shows an improvement of 2 to 10 times over the naive solutions in this work. The performance, in terms of time taken per iteration, is improved by up to 3.7 times. The million lattice units per second for both 2DQ9 and 3DQ19 models improve by more than 2 times.

机译：模板计算是一类重要的问题，可从图形处理单元（GPU）中受益。但是，考虑到GPU中的分层和片上阻塞存储组织，对于模板中的特定数据访问模式，内存性能会下降。因此，我们需要适当的数据布局以有效地使用内存的不同级别，以充分利用GPU的潜力。在这种情况下，一个特殊的模板计算问题，即具有复杂的邻域关系以及回路承载依赖性的莱迪思玻尔兹曼方法，被认为是一个很好的案例研究。在这项工作中，通过利用具有新数据布局和内核组织的内存层次结构，开发了四种不同的格子Boltzmann方法。开发这些方法的主要目的是提高计算与全局内存的访问比率，并减少总体读写延迟，即使以额外的计算为代价。 NVIDIA GPU TitanX，GTX 960，GTX 740Ti和GTX 650Ti已用于测试建议的技术。在这项工作中，计算与全局内存的访问率比单纯的解决方案提高了2到10倍。就每次迭代所花费的时间而言，性能最多可提高3.7倍。 2DQ9和3DQ19模型的每秒百万晶格单位提高了两倍以上。

著录项

来源
《Concurrency, practice and experience》 |2017年第21期|e4267.1-e4267.18|共18页
作者
Thanasekhar Balaiah; Ranjani Parthasarathi;
展开▼
作者单位

Department of Computer Technology, Anna University, Chennai, Tamilnadu, India;

Department of Information Science and Technology, Anna University, Chennai, Tamilnadu, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
graphics processing unit; Lattice Boltzmann Method; shared memory; stencil computation; virtual extended block;

机译：图形处理单元;格子波尔兹曼法共享内存;模具计算;虚拟扩展块;

相似文献

外文文献
中文文献
专利

1. ACCELERATING STENCIL COMPUTATION ON GPGPU BY NOVEL MAPPING METHOD BETWEEN THE GLOBAL MEMORY AND THE SHARED MEMORY [J] . Mo Tieqiang, Li Renfa Computing and informatics . 2018,第3期

机译：全局内存和共享内存之间通过新颖的映射方法在GPGPU上加速钢笔计算
2. Panda: A Compiler Framework for Concurrent CPU+GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers [J] . Mohammed Sourouri, Scott B. Baden, Xing Cai International journal of parallel programming . 2017,第3期

机译：Panda：在GPU加速的超级计算机上同时执行3D模具计算的CPU + GPU执行的编译器框架
3. A Data-Centric Directive-Based Framework to Accelerate Out-of-Core Stencil Computation on a GPU [J] . Jingcheng SHEN, Fumihiko INO, Albert FARRéS, IEICE transactions on information and systems . 2020,第12期

机译：基于数据为基于指令的基于指令，可以在GPU上加速核心外模板计算
4. Efficient utilization of memory hierarchy to enable the computation on bigger domains for stencil computation in CPU-GPU based systems [C] . Guanghao Jin, Lin James, Endo Toshio International Conference on High Performance Computing and Applications . 2014

机译：有效利用内存层次结构，以便在更大的域上进行计算，以便在基于CPU-GPU的系统中进行模板计算
5. Optimization of Stencil Computations on GPUs [D] . Rawat, Prashant Singh. 2018

机译：在GPU上优化模板计算
6. Using GPUs to accelerate computational diffusion MRI: Frommicrostructure estimation to tractography and connectomes [O] . Moises Hernandez-Fernandez, Istvan Reguly, Saad Jbabdi, -1

机译：使用GPU加速计算扩散MRI：从显微图像估计以物镜和连接套
7. Exploiting reuse and vectorization in blocked stencil computations on CPUs and GPUs [O] . Tuowen Zhao, Protonu Basu, Samuel Williams, 2019

机译：在CPU和GPU上阻塞模板计算中的重用和矢量化

Exploiting GPU memory hierarchy for accelerating a specialized stencil computation

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅