...
【24h】

Vectorization for SIMD Architectures with alignment constraints

机译:具有对齐约束的SIMD架构的矢量化

获取原文
获取原文并翻译 | 示例
           

摘要

When vectorizing for SIMD architectures that are commonly employed by today's multimedia extensions, one of the new challenges that arise is the handling of memory alignment. Prior research has focused primarily on vectorizing loops where all memory references are properly aligned. An important aspect of this problem, namely, how to vectorize misaligned memory references, still remains unaddressed. This paper presents a compilation scheme that systematically vectorizes loops in the presence of misaligned memory references. The core of our technique is to automatically reorganize data in registers to satisfy the alignment requirement imposed by the hardware. To reduce the data reorganization overhead, we propose several techniques to minimize the number of data reorganization operations generated. During the code generation, our algorithm also exploits temporal reuse when aligning references that access contiguous memory across loop iterations. Our code generation scheme guarantees to never load the same data associated with a single static access twice. Experimental results indicate near peak speedup factors, e.g., 3.71 for 4 data. per vector and 6.06 for 8 data per vector, respectively, for a set of loops where 75% or more of the static memory references are misaligned.
机译:当对当今的多媒体扩展通常采用的SIMD架构进行矢量化处理时,出现的新挑战之一是内存对齐的处理。先前的研究主要集中在向量化循环上,其中所有内存引用均已正确对齐。这个问题的重要方面,即如何向量化未对齐的内存引用,仍然没有解决。本文提出了一种在内存对齐不正确的情况下系统地对循环进行矢量化处理的编译方案。我们技术的核心是自动重组寄存器中的数据,以满足硬件提出的对齐要求。为了减少数据重组的开销,我们提出了几种技术来最小化生成的数据重组操作的数量。在代码生成过程中,我们的算法还可以在对齐循环访问连续内存的引用的对齐方式时利用时间重用。我们的代码生成方案保证永远不会两次加载与单个静态访问关联的相同数据。实验结果表明接近峰值加速因子,例如4个数据的3.71。每个向量和每个向量8个数据的6.06,分别针对其中75%或更多的静态内存引用未对齐的一组循环。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号