Using Task-Based Parallelism Directly on the GPU for Automated Asynchronous Data Transfer

机译：直接在GPU上使用基于任务的并行性，用于自动异步数据传输

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a framework, based on the QuickSched[1] library, that implements priority-aware task-based parallelism directly on CUDA GPUs. This allows large computations with complex data dependencies to be executed in a single GPU kernel call, removing any synchronization points that might otherwise be required between kernel calls. Using this paradigm, data transfers to and from the GPU are modelled as load and unload tasks. These tasks are automatically generated and executed alongside the rest of the computational tasks, allowing fully asynchronous and concurrent data transfers. We implemented a tiled-QR decomposition, and a Barnes-Hut gravity calculation, both of which show significant improvement when utilising the task-based setup, effectively eliminating any latencies due to data transfers between the GPU and the CPU. This shows that task-based parallelism is a valid alternative programming paradigm on GPUs, and can provide significant gains from both a data transfer and ease-of-use perspective.

机译：我们介绍了一个框架，基于QuickSched [1]库，它直接在CUDA GPU上实现优先感知任务的并行性。这允许在单个GPU内核呼叫中执行具有复杂数据依赖性的大计算，从而删除内核呼叫之间可能需要的任何同步点。使用此范例，数据传输到GPU的数据被建模为负载和卸载任务。这些任务将自动生成并与其余的计算任务一起生成并执行，允许完全异步和并发数据传输。我们实施了倾斜QR分解，以及Barnes-Hut重力计算，两者都在利用基于任务的设置时显示出显着的改进，有效地消除了由于GPU和CPU之间的数据传输而导致的任何延迟。这表明基于任务的并行性是GPU上的有效替代编程范例，并且可以从数据传输和易用性角度提供显着的增益。

著录项

来源
《International Conference series on Parallel Computing》|2016年|xx 850 pages :|共14页
会议地点
作者
Aidan B G CHALK; Pedro GONNET; Matthieu SCHALLER;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP338.6-532;
关键词
Task-based parallelism; general-purpose GPU computing; Asynchronous data transfer;

机译：基于任务的并行性;通用GPU计算;异步数据传输;

相似文献

外文文献
中文文献
专利

1. Accelerator: Using Data Parallelism to Program GPUs for General-Purpose Uses [J] . David Tarditi, Sidd Puri, Jose Oglesby Computer architecture news . 2006,第5期

机译：加速器：使用数据并行性为通用用途的GPU编程
2. Accelerator: Using Data Parallelism to Program GPUs for General-Purpose Uses [J] . David Tarditi, Sidd Puri, Jose Oglesby Operating systems review . 2006,第5期

机译：加速器：使用数据并行性为通用用途的GPU编程
3. PRAND: GPU accelerated parallel random number generation library: Using most reliable algorithms and applying parallelism of modern GPUs and CPUs [J] . L.Yu. Barash, L.N. Shchur Computer physics communications . 2014,第4期

机译：PRAND：GPU加速的并行随机数生成库：使用最可靠的算法并应用现代GPU和CPU的并行性
4. Using Task-Based Parallelism Directly on the GPU for Automated Asynchronous Data Transfer [C] . Aidan B G CHALK, Pedro GONNET, Matthieu SCHALLER International Conference series on Parallel Computing . 2016

机译：直接在GPU上使用基于任务的并行性，用于自动异步数据传输
5. Exploiting Data-Parallelism in GPUs. [D] . Zhang, Yongpeng. 2012

机译：在GPU中利用数据并行性。
6. GLMdenoise: a fast automated technique for denoising task-based fMRI data [O] . Kendrick N. Kay, Ariel Rokem, Jonathan Winawer, 2013

机译：GLMdenoise：一种用于对基于任务的fMRI数据进行去噪的快速自动化技术
7. SWIFT : using task-based parallelism, fully asynchronous communication, and graph partition-based domain decomposition for strong scaling on more than 100,000 cores. [O] . Schaller Matthieu, Gonnet Pedro, Chalk Aidan B. G., 2016

机译：SWIFT：使用基于任务的并行性，完全异步通信和基于图分区的域分解，可在100,000多个内核上进行强扩展。

Using Task-Based Parallelism Directly on the GPU for Automated Asynchronous Data Transfer

摘要

著录项

相似文献

相关主题

期刊订阅