【24h】

Efficient SIMD Code Generation for Irregular Kernels

机译:针对不规则内核的高效SIMD代码生成

获取原文

摘要

Array indirection causes several challenges for compilers to utilize single instruction, multiple data (SIMD) instructions. Disjoint memory references, arbitrarily misaligned memory references, and dependence cycles in loops are main challenges to handle for SIMD compilers. Due to those challenges, existing SIMD compilers have excluded loops with array indirection from their candidate loops for SIMD vectorization. However, addressing those challenges is inevitable, since many important compute-intensive applications extensively use array indirection to reduce memory and computation requirements. In this work, we propose a method to generate efficient SIMD code for loops containing indirected memory references. We extract both inter- and intra-iteration parallelism, taking data reorganization overhead into consideration. We also optimally place data reorganization code in order to amortize the reorganization overhead through the performance gain of SIMD vectorization. Experiments on four array indirection kernels, which are extracted from real-world scientific applications, show that our proposed method effectively generates SIMD code for irregular kernels with array indirection. Compared to the existing SIMD vectorization methods, our proposed method significantly improves the performance of irregular kernels by 91%, on average.
机译:数组间接对编译器利用单个指令,多个数据(SIMD)指令造成了一些挑战。不相交的内存引用,任意未对齐的内存引用以及循环中的依赖周期是SIMD编译器要处理的主要挑战。由于这些挑战,现有的SIMD编译器已从其候选循环中排除了具有数组间接寻址的循环,以进行SIMD向量化。但是,解决这些挑战是不可避免的,因为许多重要的计算密集型应用程序广泛使用数组间接寻址来减少内存和计算需求。在这项工作中,我们提出了一种为包含间接内存引用的循环生成高效SIMD代码的方法。考虑到数据重组开销,我们提取了迭代间和迭代内并行性。我们还优化放置数据重组代码,以通过提高SIMD向量化的性能来分摊重组开销。从现实世界的科学应用中提取的四个数组间接内核的实验表明,我们提出的方法可以有效地为带有数组间接的不规则内核生成SIMD代码。与现有的SIMD矢量化方法相比,我们提出的方法平均将不规则核的性能平均提高了91%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号