首页> 外文期刊>ACM Transactions on Parallel Computing >Extracting SIMD Parallelism from Recursive Task-Parallel Programs
【24h】

Extracting SIMD Parallelism from Recursive Task-Parallel Programs

机译:从递归任务并行程序中提取SIMD并行性

获取原文
获取原文并翻译 | 示例

摘要

The pursuit of computational efficiency has led to the proliferation of throughput-oriented hardware, from GPUs to increasingly wide vector units on commodity processors and accelerators. This hardware is designed to execute data-parallel computations in a vectorized manner efficiently. However, many algorithms are more naturally expressed as divide-and-conquer, recursive, task-parallel computations. In the absence of data parallelism, it seems that such algorithms are not well suited to throughput-oriented architectures. This article presents a set of novel code transformations that expose the data parallelism latent in recursive, task-parallel programs. These transformations facilitate straightforward vectorization of task-parallel programs on commodity hardware. We also present scheduling policies that maintain high utilization of vector resources while limiting space usage. Across several task-parallel benchmarks, we demonstrate both efficient vector resource utilization and substantial speedup on chips using Intel's SSE4.2 vector units, as well as accelerators using Intel's AVX512 units. We then show through rigorous sampling that, in practice, our vectorization techniques are effective for a much larger class of programs.
机译:对计算效率的追求导致了面向吞吐量的硬件的扩散,从GPU到商品处理器和加速器上日益广泛的矢量单元。该硬件旨在以向量化的方式高效执行数据并行计算。但是,许多算法更自然地表示为分而治之,递归,任务并行计算。在没有数据并行性的情况下,这种算法似乎不太适合于面向吞吐量的体系结构。本文介绍了一组新颖的代码转换,它们揭示了递归,任务并行程序中潜在的数据并行性。这些转换有助于在商品硬件上直接执行任务并行程序的矢量化。我们还提出了调度策略,该策略在限制空间使用的同时保持向量资源的高利用率。在多个并行任务基准测试中,我们展示了使用英特尔SSE4.2矢量单元的高效矢量资源利用率和芯片上的大幅提速,以及使用英特尔AVX512单元的加速器。然后,我们通过严格的抽样证明,在实践中,我们的矢量化技术可用于更大范围的程​​序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号