【24h】

Vector Folding: Improving Stencil Performance via Multi-dimensional SIMD-vector Representation

机译:矢量折叠:通过多维SIMD-矢量表示来改善模板性能

获取原文
获取原文并翻译 | 示例

摘要

Stencil computation is an important class of algorithms used in a large variety of scientific-simulation applications. Modern CPUs are employing increasingly longer SIMD vector registers and operations to improve computational throughput. However, the traditional use of vectors to contain sequential data elements along one dimension is not always the most efficient representation, especially in the multicore and hyper-threaded context where caches are shared among many simultaneous compute streams. This paper presents a general technique for representing data in vectors for 2D and 3D stencils. This method reduces the number of memory accesses required by storing a small multi-dimensional block of data in each vector compared to the single dimension in the traditional approach. Experiments on an Intel Xeon Phi Coprocessor show performance speedups over traditional vectors ranging from 1.2x to 2.7x, depending on the problem size and stencil type. This technique is independent of and complementary to a variety of existing stencil-computation tuning algorithms such as cache blocking, loop tiling, and wavefront parallelization.
机译:模板计算是在各种科学模拟应用程序中使用的重要算法类别。现代CPU使用越来越长的SIMD向量寄存器和操作来提高计算吞吐量。但是,传统上使用向量沿一维包含顺序数据元素并不总是最有效的表示方式,尤其是在多核和超线程环境中,缓存在许多同时的计算流之间共享。本文提出了一种通用技术,用于表示2D和3D模板向量中的数据。与传统方法中的单一维度相比,此方法通过在每个向量中存储一个小的多维数据块来减少所需的内存访问次数。在英特尔至强融核协处理器上进行的实验表明,根据问题的大小和模具类型,性能比传统矢量提高了1.2倍至2.7倍。此技术独立于并互补于各种现有的模板计算调整算法,例如缓存阻止,循环平铺和波前并行化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号