PFACC: An OpenACC-like programming model for irregular nested parallelism

Huang Ming Hsiang; Yang Wuu

首页> 外文期刊>Software, practice & experience >PFACC: An OpenACC-like programming model for irregular nested parallelism

【24h】

PFACC: An OpenACC-like programming model for irregular nested parallelism

机译：PFACC：不规则嵌套并行性的OpenACC等编程模型

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

OpenACC is a directive-based programming model which allows programmers to write graphic processing unit (GPU) programs by simply annotating parallel loops. However, OpenACC has poor support for irregular nested parallel loops, which are natural choices to express nested parallelism. We propose PFACC, a programming model similar to OpenACC. PFACC directives can be used to annotate parallel loops and to guide data movement between different levels of memory hierarchy. Parallel loops can be arbitrarily nested or be placed inside functions that would be (possibly recursively) called in other parallel loops. The PFACC translator translates C programs with PFACC directives into CUDA programs by inserting runtime iteration-sharing and memory allocation routines. The PFACC runtime iteration-sharing routine is a two-level mechanism. Thread blocks dynamically organize loop iterations intobatchesand execute the batches in a depth-first order. Different thread blocks share iterations among one another with an iteration-stealing mechanism. PFACC generates CUDA programs with reasonable memory usage because of the depth-first execution order. The two-level iteration-sharing mechanism is implemented purely in software and fits well with the CUDA thread hierarchy. Experiments show that PFACC outperforms CUDA dynamic parallelism in terms of performance and code size on most benchmarks.

机译：OpenACC是一种基于指令的编程模型，允许程序员通过简单地注释并行环路来编写图形处理单元（GPU）程序。但是，OpenACC对不规则嵌套并行环路的支持不佳，这是表达嵌套并行性的自然选择。我们提出PFACC，一个类似于OpenACC的编程模型。 PFACC指令可用于注释并行循环，并指导不同级别的内存层级之间的数据移动。并行环路可以任意嵌套或放置在其他并行环路中（可能递归）的功能中被置于函数中。 PFACC Translator通过插入运行时迭代共享和内存分配例程将带有PFACC指令的C程序转换为CUDA程序。 PFACC运行时迭代共享例程是双级机制。线程块动态组织循环迭代intobatchesand以深度第一顺序执行批处理。不同的线程块以迭代窃取机制彼此共享迭代。 PFACC由于深度第一执行顺序而产生具有合理内存使用的CUDA程序。双层迭代共享机制纯粹在软件中实现，并与CUDA线程层次结构良好。实验表明，在大多数基准上的性能和代码大小方面，PFACC优于CUDA动态并行性。

著录项

来源
《Software, practice & experience》 |2020年第10期|1877-1904|共28页
作者
Huang Ming Hsiang; Yang Wuu;
展开▼
作者单位

Natl Chiao Tung Univ Dept Comp Sci Hsinchu Taiwan;

Natl Chiao Tung Univ Dept Comp Sci Hsinchu Taiwan;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
dynamic scheduling; GPGPU; irregular parallelism; nested parallelism; OpenACC; parallel programming model; PFACC;

机译：动态调度;GPGPU;不规则的并行性;嵌套并行性;OPENACC;并行编程模型;PFACC;

相似文献

外文文献
中文文献
专利

1. Adjusting Thread Parallelism Dynamically to Accelerate Dynamic Programming with Irregular Workload Distribution on GPGPUs [J] . Chao-Chin Wu, Jenn-Yang Ke, Heshan Lin, International journal of grid and high performance computing . 2014,第1期

机译：动态调整线程并行度以加快GPGPU上不规则工作负载分布的动态编程
2. Programming the Memory Hierarchy Revisited: Supporting Irregular Parallelism in Sequoia [J] . Michael Bauer, John Clark, Eric Schkufza, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2011,第8期

机译：再谈编程内存层次结构：在红杉资本中支持不规则并行性
3. A compiler for exploiting nested parallelism in OpenMP programs [J] . Xinmin Tian, Jay P. Hoeflinger, Grant Haab, Parallel Computing . 2005,第10a12期

机译：在OpenMP程序中利用嵌套并行性的编译器
4. Exploring thread-level parallelism based on cost-driven model for irregular programs [C] . Yuancheng Li, Bin Liu IEEE International Conference on Signal Processing, Communications and Computing . 2017

机译：探索基于成本驱动模型的线程级并行性
5. Enabling Efficient Parallelism for Applications with Dependences and Irregular Memory Accesses [D] . Jiang, Peng. 2019

机译：为具有依赖性和不规则内存访问的应用程序启用有效的并行性
6. IOPA: I/O-aware parallelism adaption for parallel programs [O] . Tao Liu, Yi Liu, Chen Qian, 2012

机译：IOPA：适用于并行程序的I / O感知并行性

PFACC: An OpenACC-like programming model for irregular nested parallelism

摘要

著录项

相似文献

相关主题

期刊订阅