首页> 外文OA文献 >Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates
【2h】

Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates

机译:多核优化的波前菱形阻滞,用于优化模板更新

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The importance of stencil-based algorithms in computational science has focused attention on optimized parallel implementations for multilevel cache-based processors. Temporal blocking schemes leverage the large bandwidth and low latency of caches to accelerate stencil updates and approach theoretical peak performance. A key ingredient is the reduction of data traffic across slow data paths, especially the main memory interface. In this work we combine the ideas of multicore wavefront temporal blocking and diamond tiling to arrive at stencil update schemes that show large reductions in memory pressure compared to existing approaches. The resulting schemes show performance advantages in bandwidth-starved situations, which are exacerbated by the high bytes per lattice update case of variable coefficients. Our thread groups concept provides a controllable trade-off between concurrency and memory usage, shifting the pressure between the memory interface and the CPU. We present performance results on a contemporary Intel processor.
机译:基于模板的算法在计算科学中的重要性已将注意力集中在基于多级缓存的处理器的优化并行实现上。时间阻塞方案利用高速缓存的大带宽和低延迟来加速模板更新并达到理论上的峰值性能。一个关键因素是减少慢速数据路径(尤其是主内存接口)上的数据流量。在这项工作中,我们结合了多核波前时间阻塞和菱形平铺的思想,以得出模版更新方案,与现有方法相比,该方案显示出内存压力的大幅降低。所得的方案在带宽不足的情况下显示出性能优势,可变系数的每个点阵更新情况下高字节会加剧这种情况。我们的线程组概念在并发性和内存使用之间提供了可控制的权衡,从而改变了内存接口和CPU之间的压力。我们在现代英特尔处理器上展示性能结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号