Extracting SIMD Parallelism from Recursive Task-Parallel Programs

BIN REN; SHRUTHI BALAKRISHNA; YOUNGJOON JO; SRIRAM KRISHNAMOORTHY; KUNAL AGRAWAL; MILIND KULKARNI

首页> 外文期刊>ACM Transactions on Parallel Computing >Extracting SIMD Parallelism from Recursive Task-Parallel Programs

【24h】

Extracting SIMD Parallelism from Recursive Task-Parallel Programs

机译：从递归任务并行程序中提取SIMD并行性

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The pursuit of computational efficiency has led to the proliferation of throughput-oriented hardware, from GPUs to increasingly wide vector units on commodity processors and accelerators. This hardware is designed to execute data-parallel computations in a vectorized manner efficiently. However, many algorithms are more naturally expressed as divide-and-conquer, recursive, task-parallel computations. In the absence of data parallelism, it seems that such algorithms are not well suited to throughput-oriented architectures. This article presents a set of novel code transformations that expose the data parallelism latent in recursive, task-parallel programs. These transformations facilitate straightforward vectorization of task-parallel programs on commodity hardware. We also present scheduling policies that maintain high utilization of vector resources while limiting space usage. Across several task-parallel benchmarks, we demonstrate both efficient vector resource utilization and substantial speedup on chips using Intel's SSE4.2 vector units, as well as accelerators using Intel's AVX512 units. We then show through rigorous sampling that, in practice, our vectorization techniques are effective for a much larger class of programs.

机译：对计算效率的追求导致了面向吞吐量的硬件的扩散，从GPU到商品处理器和加速器上日益广泛的矢量单元。该硬件旨在以向量化的方式高效执行数据并行计算。但是，许多算法更自然地表示为分而治之，递归，任务并行计算。在没有数据并行性的情况下，这种算法似乎不太适合于面向吞吐量的体系结构。本文介绍了一组新颖的代码转换，它们揭示了递归，任务并行程序中潜在的数据并行性。这些转换有助于在商品硬件上直接执行任务并行程序的矢量化。我们还提出了调度策略，该策略在限制空间使用的同时保持向量资源的高利用率。在多个并行任务基准测试中，我们展示了使用英特尔SSE4.2矢量单元的高效矢量资源利用率和芯片上的大幅提速，以及使用英特尔AVX512单元的加速器。然后，我们通过严格的抽样证明，在实践中，我们的矢量化技术可用于更大范围的程序。

著录项

来源
《ACM Transactions on Parallel Computing》 |2019年第4期|24.1-24.37|共37页
作者
BIN REN; SHRUTHI BALAKRISHNA; YOUNGJOON JO; SRIRAM KRISHNAMOORTHY; KUNAL AGRAWAL; MILIND KULKARNI;
展开▼
作者单位

William & Mary Pacific Northwest National Laboratory;

Purdue University;

Pacific Northwest National Laboratory;

Washington University in St. Louis;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Recursive programs; task parallelism; vectorization;

机译：递归程序;任务并行性向量化;

相似文献

外文文献
中文文献
专利

1. Decidable models of integer-manipulating programs with recursive parallelism [J] . Hague Matthew, Lin Anthony W. Theoretical computer science . 2018,第期

机译：具有递归行度的整数型程序的可判定模型
2. Cache-Oblivious Wavefront: Improving Parallelism of Recursive Dynamic Programming Algorithms without Losing Cache-Efficiency [J] . Tang Yuan, You Ronghui, Kan Haibin, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2015,第8期

机译：高速缓存不可忽略的波前：在不损失高速缓存效率的情况下提高递归动态编程算法的并行性
3. Using transitive closure and transitive reduction to extract coarse-grained parallelism in program loops [J] . W?odzimierz BIELECKI, rnMarek PALKOWSKI, rnKrzysztof SIEDLECKI Pomiary Automatyka Kontrola . 2010,第8期

机译：使用传递闭包和传递归约来提取程序循环中的粗粒度并行度
4. Exploiting Vector and Multicore Parallelism for Recursive, Data- and Task-Parallel Programs [C] . Bin Ren, Sriram Krishnamoorthy, Kunal Agrawal, ACM SIGPLAN Symposium on Priciples and Practice of Parallel Programming . 2016

机译：用于递归，数据和任务并行计划的剥削传染媒介和多核并行度
5. Extracting data-level parallelism from sequential programs for SIMD execution. [D] . Baumstark, Lewis Benton, Jr. 2004

机译：从顺序程序中提取数据级并行性以执行SIMD。
6. SIMD Optimization of Linear Expressions for Programmable Graphics Hardware [O] . Chandrajit Bajaj, Insung Ihm, Jungki Min, -1

机译：可编程图形硬件的线性表达式的SIMD优化
7. Decidable models of integer-manipulating programs with recursive parallelism [O] . Hague Matthew, Lin Anthony 2016

机译：具有递归并行性的整数处理程序的可确定模型

Extracting SIMD Parallelism from Recursive Task-Parallel Programs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅