首页> 外文会议>ACM/IEEE International Symposium on Computer Architecture >Fine-grain Task Aggregation and Coordination on GPUs
【24h】

Fine-grain Task Aggregation and Coordination on GPUs

机译:GPU上的细粮任务汇总与协调

获取原文

摘要

In general-purpose graphics processing unit (GPGPU) computing, data is processed by concurrent threads executing the same function. This model, dubbed single-instruction/multiple-thread (SIMT), requires programmers to coordinate the synchronous execution of similar operations across thousands of data elements. To alleviate this programmer burden, Gaster and Howes outlined the channel abstraction, which facilitates dynamically aggregating asynchronously produced fine-grain work into coarser-grain tasks. However, no practical implementation has been proposed. To this end, we propose and evaluate the first channel implementation. To demonstrate the utility of channels, we present a case study that maps the fine-grain, recursive task spawning in the Cilk programming language to channels by representing it as a flow graph. To support data-parallel recursion in bounded memory, we propose a hardware mechanism that allows wavefronts to yield their execution resources. Through channels and wavefront yield, we implement four Cilk benchmarks. We show that Cilk can scale with the GPU architecture, achieving speedups of as much as 4.3x on eight compute units.
机译:在通用图形处理单元(GPGPU)计算中,通过执行相同功能的并发线程处理数据。该模型被称为单指令/多线程(SIMT),要求程序员协调数千个数据元素的同步执行类似操作。为了减轻这种程序员的负担,加气捷特和Howes概述了渠道抽象,这有利于将异步地汇总为较粗糙的细粒作品变为粗糙晶粒任务。但是,没有提出实际实施。为此,我们提出并评估了第一频道实现。为了展示渠道的效用,我们提出了一种案例研究,通过将其作为流程图表示,将以Cilk编程语言中的细晶粒递归任务映射到通道。为了支持有界内存中的数据并行递归,我们提出了一种硬件机制,该硬件机制允许波前产生其执行资源。通过频道和波前的产量,我们实现了四个Cilk基准。我们显示Cilk可以使用GPU架构扩展,在八个计算单元上实现高速4.3x的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号