Inter-block GPU communication via fast barrier synchronization

机译：通过快速屏障同步实现块间GPU通信

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

While GPGPU stands for general-purpose computation on graphics processing units, the lack of explicit support for inter-block communication on the GPU arguably hampers its broader adoption as a general-purpose computing device. Interblock communication on the GPU occurs via global memory and then requires barrier synchronization across the blocks, i.e., inter-block GPU communication via barrier synchronization. Currently, such synchronization is only available via the CPU, which in turn, can incur significant overhead. We propose two approaches for inter-block GPU communication via barrier synchronization: GPU lock-based synchronization and GPU lock-free synchronization. We then evaluate the efficacy of each approach via a micro-benchmark as well as three well-known algorithms — Fast Fourier Transform (FFT), dynamic programming, and bitonic sort. For the microbenchmark, the experimental results show that our GPU lock-free synchronization performs 8.4 times faster than CPU explicit synchronization and 4.0 times faster than CPU implicit synchronization. When integrated with the FFT, dynamic programming, and bitonic sort algorithms, our GPU lock-free synchronization further improves performance by 10%, 26%, and 40%, respectively, and ultimately delivers an overall speed-up of 70x, 13x, and 24x, respectively.

机译：虽然GPGPU代表图形处理单元上的通用计算，但对GPU上的块间通信缺乏明确支持可能会妨碍其作为通用计算设备的广泛采用。 GPU上的块间通信是通过全局内存发生的，然后需要跨块进行屏障同步，即，通过屏障同步进行块间GPU通信。当前，此类同步仅可通过CPU使用，这反过来会产生大量开销。我们提出了两种通过屏障同步进行块间GPU通信的方法：基于GPU锁的同步和GPU无锁的同步。然后，我们通过微基准以及三种众所周知的算法-快速傅立叶变换（FFT），动态编程和双音排序来评估每种方法的效果。对于微基准测试，实验结果表明，我们的GPU无锁同步执行速度比CPU显式同步快8.4倍，比CPU隐式同步快4.0倍。与FFT，动态编程和双音排序算法集成后，我们的GPU无锁同步功能分别将性能分别提高了10％，26％和40％，最终使整体速度提高了70倍，13倍和分别是24x。

著录项

来源
《2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS)》|2010年|P.1-12|共12页
会议地点 Atlanta GA(US);Atlanta GA(US)
作者
Xiao Shucai; Feng Wu-chun;
展开▼
作者单位

Department of Electrical and Computer Engineering Virginia Tech, Blacksburg, Virginia 24061;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类 TP311.133;
关键词

相似文献

外文文献
中文文献
专利

1. Using Inter-Block Synchronization to Improve the Knapsack Problem on GPUs [J] . Xue Sun, Chao-Chin Wu, Liang-Rui Chen, International journal of grid and high performance computing . 2018,第4期

机译：使用块间同步来改善GPU上的背包问题
2. StreamScan: Fast Scan Algorithms for GPUs without Global Barrier Synchronization [J] . Shengen Yan, Guoping Long, Yunquan Zhang ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2013,第8期

机译：StreamScan：无需全局屏障同步的GPU快速扫描算法
3. Enhancement of membrane computing model implementation on GPU by introducing matrix representation for balancing occupancy and reducing inter-block communications [J] . Ali Maroosi, Ravie Chandren Muniyandi Journal of computational science . 2014,第6期

机译：通过引入矩阵表示来平衡占用率并减少块间通信，从而增强了GPU上的膜计算模型实现
4. Inter-block GPU communication via fast barrier synchronization [C] . Shucai Xiao, Wu-chun Feng 2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) . 2010

机译：通过快速屏障同步实现块间GPU通信
5. Micro-architectural support for improving synchronization and efficiency of simd execution on gpus. [D] . Yilmazer, Ayse. 2013

机译：微体系结构支持，用于提高gpus上simd执行的同步性和效率。
6. GPU-FS-kNN: A Software Tool for Fast and Scalable kNN Computation Using GPUs [O] . Ahmed Shamsul Arefin, Carlos Riveros, Regina Berretta, -1

机译：GpU-Fs-KNN：一个软件工具用于快速可扩展的kNN计算使用的GpU
7. Inter-Block GPU Communication via Fast Barrier Synchronization [O] . Shucai Xiao, Wu-chun Feng 2010

机译：通过快速屏障同步进行块间GPU通信

Inter-block GPU communication via fast barrier synchronization

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅