A new memory mapping mechanism for GPGPUs' stencil computation

Mo Tieqiang; Li Renfa

首页> 外文期刊>Computing >A new memory mapping mechanism for GPGPUs' stencil computation

【24h】

A new memory mapping mechanism for GPGPUs' stencil computation

机译：GPGPU模板计算的新内存映射机制

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

When optimizing performance on a GPU, control flow divergence of threads in one warp can make up the possible performance bottlenecks. In our hand-coded GPU stencil computation optimization, with a view to remove this control flow divergence brought by conventional mapping method between global memory and shared memory, we devise a new mapping mechanism by modeling the coalesced memory accesses of GPU threads and the aligned ghost zone overheads to remove conditional statements of the boundary XY-tile stencil computation points for improved performance. In addition, we utilize only one XY-tile loaded into registers in every stencil computation iteration, common sub-expression elimination and software prefetching to reduce overheads. Finally, detailed performance evaluation demonstrates that global memory access traffic is close to the idealized lower bound value through our optimized policies, that is to say, in every computed point of one XY-tile the memory access traffic is roughly 6 and 4 % more than 8 bytes per XY-tile point of the idealized lower bound memory access traffic in which ghost zone overheads are not taken into consideration on Tesla C2050 and Kepler K20X respectively.

机译：在GPU上优化性能时，一次扭曲中线程的控制流差异可能会弥补性能瓶颈。在我们手工编码的GPU模板计算优化中，为了消除全局内存和共享内存之间的常规映射方法带来的这种控制流差异，我们通过对GPU线程和对齐的重影的合并内存访问进行建模，设计了一种新的映射机制。区域开销以删除边界XY-tile模板计算点的条件语句，以提高性能。此外，在每次模板计算迭代，通用子表达式消除和软件预取中，我们仅利用一个加载到寄存器中的XY瓦片来减少开销。最后，详细的性能评估表明，通过我们的优化策略，全局内存访问流量接近理想的下限值，也就是说，在每个XY平铺的每个计算点上，内存访问流量大约比XY平铺多6％和4％理想的下限内存访问流量的每个XY平铺点8字节，其中在Tesla C2050和Kepler K20X上分别没有考虑重影区开销。

著录项

来源
《Computing》 |2015年第8期|795-812|共18页
作者
Mo Tieqiang; Li Renfa;
展开▼
作者单位

Hunan Univ, Coll Informat Sci & Engn, Changsha 410082, Hunan, Peoples R China;

Hunan Univ, Coll Informat Sci & Engn, Changsha 410082, Hunan, Peoples R China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Memory mapping; Control flow divergence; Stencil computation; Ghost zone; Memory access traffic; GPUs; Software prefetching; Coalesced memory; Shared memory; Memory bandwidth;

机译：内存映射;控制流散度;模板计算;重影区;内存访问流量;GPU;软件预取;共享内存;共享内存;内存带宽;

相似文献

外文文献
中文文献
专利

1. ACCELERATING STENCIL COMPUTATION ON GPGPU BY NOVEL MAPPING METHOD BETWEEN THE GLOBAL MEMORY AND THE SHARED MEMORY [J] . Mo Tieqiang, Li Renfa Computing and informatics . 2018,第3期

机译：全局内存和共享内存之间通过新颖的映射方法在GPGPU上加速钢笔计算
2. Evaluating optimizations that reduce globalmemory accesses of stencil computations in GPGPUs [J] . Thiago Carrijo Nasciutti, Jairo Panetta, Pedro Pais Lopes Concurrency, practice and experience . 2019,第18期

机译：评估减少GPGPU中模板计算的全局内存访问的优化
3. Evaluating optimizations that reduce globalmemory accesses of stencil computations in GPGPUs [J] . Thiago Carrijo Nasciutti, Jairo Panetta, Pedro Pais Lopes Concurrency, practice and experience . 2019,第18期

机译：评估减少GPGPU中的模板计算的GlobalMemory访问的优化
4. Realizing Out-of-Core Stencil Computations Using Multi-tier Memory Hierarchy on GPGPU Clusters [C] . Toshio Endo IEEE International Conference on Cluster Computing . 2016

机译：在GPGPU群集上使用多层内存层次结构实现内核外模板计算
5. On the matter of memory: Neural computation and the mechanisms of intentional agency. [D] . Morgan, Alexander D. 2014

机译：关于记忆问题：神经计算和故意代理机制。
6. Computational Mapping Identifies Localized Mechanisms for Ablation of Atrial Fibrillation [O] . Sanjiv M. Narayan, David E. Krummen, Michael W. Enyeart, -1

机译：计算对应关系识别本地化机制房颤消融
7. Multi-dimensional intra-tile parallelization for memory-starved stencil computations [O] . Malas, Tareq, Hager, Georg, Ltaief, Hatem, 2015

机译：用于记忆饥饿模板的多维瓦片内并行化计算

A new memory mapping mechanism for GPGPUs' stencil computation

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅