【24h】

A Predictive Performance Model for Stencil Codes on Multicore CPUs

机译:多核CPU上的模板代码预测性能模型

获取原文

摘要

In this paper we present an analytical performance model which yields estimates for the performance of stencil based simulations. Unlike previous models, we do neither rely on prototype implementations, nor do we examine the computational intensity only. Our model allows for memory optimizations such as cache blocking and non-temporal stores. Multi-threading, loop-unrolling, and vectorization are covered, too. The model is built from a sequence of 1D loops. For each loop we map the different parts of the instruction stream to the corresponding CPU pipelines and estimate their throughput. The load/store streams may be affected not only by their destination (the cache level or NUMA domain they target), but also by concurrent access of other threads. Evaluation of a Jacobi solver and the Himeno benchmark shows that the model is accurate enough to capture real live kernels.
机译:在本文中,我们提出了一种分析性能模型,其产生了基于模板模拟的性能的估计。与以前的模型不同,我们既不依靠原型实现,也不是我们仅检查计算强度。我们的模型允许内存优化,例如缓存阻塞和非时间商店。多线程,循环展开和矢量化也被覆盖。该模型由1D循环序列构建。对于每个循环,我们将指令流的不同部分映射到相应的CPU管道并估计其吞吐量。负载/商店流可能不仅受到目的地(高速缓存级别或它们目标的NUMA域)的影响,还可以影响其他线程的并发访问。评估Jacobi求解器和Himeno基准显示该模型足以足以捕获真实的活核。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号