Cache Oblivious Parallelograms in Iterative Stencil Computations

机译：在迭代模板计算中缓存遗漏的平行四边形

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a new cache oblivious scheme for iterative stencil computations that performs beyond system bandwidth limitations as though gigabytes of data could reside in an enormous on-chip cache. We compare execution times for 2D and 3D spatial domains with up to 128 million double precision elements for constant and variable stencils against hand-optimized naive code and the automatic polyhedral parallelizer and locality optimizer PluTo and demonstrate the clear superiority of our results.rnThe performance benefits stem from a tiling structure that caters for data locality, parallelism and vectorization simultaneously. Rather than tiling the iteration space from inside, we take an exterior approach with a predefined hierarchy, simple regular parallelogram tiles and a locality preserving parallelization. These advantages come at the cost of an irregular work-load distribution but a tightly integrated load-balancer ensures a high utilization of all resources.

机译：我们提出了一种用于模板迭代计算的新的缓存忽略方案，该方案的执行超出了系统带宽的限制，就好像千兆字节的数据可以驻留在巨大的片上缓存中一样。我们将2D和3D空间域的执行时间与多达1.28亿个双精度元素（用于恒定和可变模板），手动优化的朴素代码以及自动多面体并行器和局部性优化器PluTo进行比较，并证明了我们结果的明显优势。-性能优势源于同时满足数据局部性，并行性和向量化的切片结构。与其从内部平铺迭代空间，不如采用具有预定义层次结构，简单的规则平行四边形图块以及保留并行性的局部性的外部方法。这些优势是以不规则的工作负载分配为代价的，但是紧密集成的负载均衡器可确保所有资源的高利用率。

著录项

来源
《24th ACM international conference on supercomputing 2010》|2010年|p.49-59|共11页
会议地点 Amsterdam(NL);Amsterdam(NL)
作者
Robert Strzodka; Mohammed Shaheen; Dawid Pajak; Hans-Peter Seidel; z;
展开▼
作者单位

Max Planck Institut Informatik Campus E1 4 Saarbruecken, Germany;

rnMax Planck Institut Informatik Campus E1 4 Saarbruecken, Germany;

rnWest Pomeranian University of Technology Zotnierska 49 Szczecin, Poland;

rnMax Planck Institut Informatik Campus E1 4 Saarbruecken, Germany;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
memory wall; memory bound; stencil; time skewing; temporal blocking; cache oblivious; parallelism and locality;

机译：内存墙；内存限制模版;时间偏斜时间障碍；缓存遗忘；并行性和局部性;

相似文献

外文文献
中文文献
专利

1. The memory behavior of cache oblivious stencil computations [J] . Matteo Frigo, Volker Strumpen Journal of supercomputing . 2007,第2期

机译：高速缓存遗忘模版计算的内存行为
2. Cache-Oblivious Buffer Heap and Cache-Efficient Computation of Shortest Paths in Graphs [J] . Chowdhury Rezaul A., Ramachandran Vijaya ACM transactions on algorithms . 2018,第1期

机译：缓存令人沮丧的缓冲区堆和高速缓存 - 高效计算图形中最短路径
3. Multi-level spatial and temporal tiling for efficient HPC stencil computation on many-core processors with large shared caches [J] . Charles Yount, Alejandro Duran, Josh Tobin Future generation computer systems . 2019,第MARa期

机译：多级空间和时间分块，可在具有大型共享缓存的多核处理器上进行高效的HPC模具计算
4. Cache Oblivious Parallelograms in Iterative Stencil Computations [C] . Robert Strzodka, Mohammed Shaheen, Dawid Pajak, ACM international conference on supercomputing . 2010

机译：缓存在迭代模版计算中忽略了平行四边形
5. Towards Automatic Compilation for Energy Efficient Iterative Stencil Computations [D] . Zou, Yun. 2016

机译：朝向节能迭代模板计算的自动编译
6. Cache-Oblivious parallel SIMD Viterbi decoding for sequence search in HMMER [O] . Miguel Ferreira, Nuno Roma, Luis MS Russo 2014

机译：用于HMMER中序列搜索的高速缓存不并行SIMD维特比解码
7. Cache Oblivious Parallelograms in Iterative Stencil Computations [O] . Robert Strzodka, Mohammed Shaheen, Hans-peter Seidel 2011

机译：在迭代模板计算中缓存不经意的平行四边形

Cache Oblivious Parallelograms in Iterative Stencil Computations

摘要

著录项

相似文献

相关主题

期刊订阅