Fine-grain Task Aggregation and Coordination on GPUs

机译：GPU上的细粮任务汇总与协调

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In general-purpose graphics processing unit (GPGPU) computing, data is processed by concurrent threads executing the same function. This model, dubbed single-instruction/multiple-thread (SIMT), requires programmers to coordinate the synchronous execution of similar operations across thousands of data elements. To alleviate this programmer burden, Gaster and Howes outlined the channel abstraction, which facilitates dynamically aggregating asynchronously produced fine-grain work into coarser-grain tasks. However, no practical implementation has been proposed. To this end, we propose and evaluate the first channel implementation. To demonstrate the utility of channels, we present a case study that maps the fine-grain, recursive task spawning in the Cilk programming language to channels by representing it as a flow graph. To support data-parallel recursion in bounded memory, we propose a hardware mechanism that allows wavefronts to yield their execution resources. Through channels and wavefront yield, we implement four Cilk benchmarks. We show that Cilk can scale with the GPU architecture, achieving speedups of as much as 4.3x on eight compute units.

机译：在通用图形处理单元（GPGPU）计算中，通过执行相同功能的并发线程处理数据。该模型被称为单指令/多线程（SIMT），要求程序员协调数千个数据元素的同步执行类似操作。为了减轻这种程序员的负担，加气捷特和Howes概述了渠道抽象，这有利于将异步地汇总为较粗糙的细粒作品变为粗糙晶粒任务。但是，没有提出实际实施。为此，我们提出并评估了第一频道实现。为了展示渠道的效用，我们提出了一种案例研究，通过将其作为流程图表示，将以Cilk编程语言中的细晶粒递归任务映射到通道。为了支持有界内存中的数据并行递归，我们提出了一种硬件机制，该硬件机制允许波前产生其执行资源。通过频道和波前的产量，我们实现了四个Cilk基准。我们显示Cilk可以使用GPU架构扩展，在八个计算单元上实现高速4.3x的加速。

著录项

来源
《ACM/IEEE International Symposium on Computer Architecture》|2014年||共12页
会议地点
作者
Marc S. Orr; Bradford M. Beckmann; Steven K. Reinhardt; David A. Wood;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP303-53;
关键词
Coordination; GPGPU; SIMT;

机译：协调;GPGPU;感觉;

相似文献

外文文献
中文文献
专利

1. Fine-grain Task Aggregation and Coordination on GPUs [J] . Marc S. Orr, Bradford M. Beckmann, Steven K. Reinhardt, Computer architecture news . 2014,第3期

机译：GPU上的细粒度任务聚合和协调
2. Fine-grain parallelism using multi-core, Cell/BE, and GPU Systems [J] . Frederico Pratas, Pedro Trancoso, Leonel Sousa, Parallel Computing . 2012,第8期

机译：使用多核，Cell / BE和GPU系统的细粒度并行性
3. KPN2GPU: An Approach for Discovery and Exploitation of Fine-Grain Data Parallelism in Process Networks [J] . Ana Balevic, Bart Kienhuis Computer architecture news . 2011,第4期

机译：KPN2GPU：在过程网络中发现和利用细粒度数据并行性的方法
4. Fine-grain Task Aggregation and Coordination on GPUs [C] . Marc S. Orr, Bradford M. Beckmann, Steven K. Reinhardt, ACM/IEEE International Symposium on Computer Architecture . 2014

机译：GPU上的细粮任务汇总与协调
5. Efficient Fine-Grain Cooperative Execution of Dynamic Task Parallelism on Heterogeneous Multi/Manycore Systems [D] . Wang, Moyang. 2021

机译：异构多/多核系统动态任务并行性的高效微粒合作执行
6. Effectiveness of bimanual coordination tasks performance in improving coordination skills and cognitive functions in elderly [O] . Danuta Roman-Liu, Zofia Mockałło, Feng Chen, 2020

机译：生物协调任务在提高老年人协调技能和认知功能方面的效力
7. GPGPU Task Scheduling Technique for Reducing the Performance Deviation of Multiple GPGPU Tasks in RPC-Based GPU Virtualization Environments [O] . Jihun Kang, Heonchang Yu 2021

机译：GPGPU任务调度技术在基于RPC的GPU虚拟化环境中降低多个GPGPU任务的性能偏差

Fine-grain Task Aggregation and Coordination on GPUs

摘要

著录项

相似文献

相关主题

期刊订阅