首页> 外文会议>24th ACM international conference on supercomputing 2010 >Cache Oblivious Parallelograms in Iterative Stencil Computations
【24h】

Cache Oblivious Parallelograms in Iterative Stencil Computations

机译:在迭代模板计算中缓存遗漏的平行四边形

获取原文
获取原文并翻译 | 示例

摘要

We present a new cache oblivious scheme for iterative stencil computations that performs beyond system bandwidth limitations as though gigabytes of data could reside in an enormous on-chip cache. We compare execution times for 2D and 3D spatial domains with up to 128 million double precision elements for constant and variable stencils against hand-optimized naive code and the automatic polyhedral parallelizer and locality optimizer PluTo and demonstrate the clear superiority of our results.rnThe performance benefits stem from a tiling structure that caters for data locality, parallelism and vectorization simultaneously. Rather than tiling the iteration space from inside, we take an exterior approach with a predefined hierarchy, simple regular parallelogram tiles and a locality preserving parallelization. These advantages come at the cost of an irregular work-load distribution but a tightly integrated load-balancer ensures a high utilization of all resources.
机译:我们提出了一种用于模板迭代计算的新的缓存忽略方案,该方案的执行超出了系统带宽的限制,就好像千兆字节的数据可以驻留在巨大的片上缓存中一样。我们将2D和3D空间域的执行时间与多达1.28亿个双精度元素(用于恒定和可变模板),手动优化的朴素代码以及自动多面体并行器和局部性优化器PluTo进行比较,并证明了我们结果的明显优势。-性能优势源于同时满足数据局部性,并行性和向量化的切片结构。与其从内部平铺迭代空间,不如采用具有预定义层次结构,简单的规则平行四边形图块以及保留并行性的局部性的外部方法。这些优势是以不规则的工作负载分配为代价的,但是紧密集成的负载均衡器可确保所有资源的高利用率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号