High performance comparison-based sorting algorithm on many-core GPUs

机译：多核GPU上基于高性能比较的排序算法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sorting is a kernel algorithm for a wide range of applications. In this paper, we present a new algorithm, GPU-Warpsort, to perform comparison-based parallel sort on Graphics Processing Units (GPUs). It mainly consists of a bitonic sort followed by a merge sort. Our algorithm achieves high performance by efficiently mapping the sorting tasks to GPU architectures. Firstly, we take advantage of the synchronous execution of threads in a warp to eliminate the barriers in bitonic sorting network. We also provide sufficient homogeneous parallel operations for all the threads within a warp to avoid branch divergence. Furthermore, we implement the merge sort efficiently by assigning each warp independent pairs of sequences to be merged and by exploiting totally coalesced global memory accesses to eliminate the bandwidth bottleneck. Our experimental results indicate that GPU-Warpsort works well on different kinds of input distributions, and it achieves up to 30% higher performance than previous optimized comparison-based GPU sorting algorithm on input sequences with millions of elements.

机译：排序是适用于广泛应用程序的内核算法。在本文中，我们提出了一种新的算法GPU-Warpsort，可以对图形处理单元（GPU）执行基于比较的并行排序。它主要由双音排序后跟合并排序组成。我们的算法通过有效地将排序任务映射到GPU架构来实现高性能。首先，我们利用经纱中线程的同步执行来消除双音分类网络中的障碍。我们还为扭曲内的所有线程提供了足够的同类并行操作，以避免分支发散。此外，我们通过分配要合并的每个扭曲独立序列对并利用完全合并的全局内存访问来消除带宽瓶颈，从而有效地实现了合并排序。我们的实验结果表明，GPU-Warpsort在不同类型的输入分布上均能很好地工作，并且比以前优化的基于比较的GPU排序算法对具有数百万个元素的输入序列的性能提高了30％。

著录项

来源
《2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS)》|2010年|p.1-10|共10页
会议地点 Atlanta GA(US)
作者
Xiaochun Ye; Dongrui Fan; Wei Lin; Nan Yuan; Ienne P.;
展开▼
作者单位

Key Lab. of Comput. Syst. Archit., Chinese Acad. of Sci., Beijing, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类 TP311.133;
关键词
Bitonic Network; CUDA; GPU; Many-Core; Merge Sort; Sorting Algorithm;

机译：Bitonic网络; CUDA; GPU;多核;合并排序;排序算法;

相似文献

外文文献
中文文献
专利

1. Parallel Shellsort Algorithm for Many-Core GPUs with CUDA [J] . Chun-Yuan Lin, Wei Sheng Lee, Chuan Yi Tang International journal of grid and high performance computing . 2012,第2期

机译：带有CUDA的多核GPU的并行Shellsort算法
2. Combining high productivity and high performance in image processing using Single Assignment C on multi-core CPUs and many-core GPUs [J] . Volkmar Wieser, Clemens Grelck, Peter Haslinger, Journal of electronic imaging . 2012,第2期

机译：在多核CPU和多核GPU上使用Single Assignment C将图像处理中的高生产率和高性能相结合
3. Comparing performance of many-core CPUs and GPUs for static and motion compensated reconstruction of C-arm CT data. [J] . Hofmann HG, Keck B, Rohkohl C, Medical Physics . 2011,第1期

机译：比较C臂CT数据的静态和运动补偿重建的多核CPU和GPU的性能。
4. High performance comparison-based sorting algorithm on many-core GPUs [C] . Ye Xiaochun, Fan Dongrui, Lin Wei, 2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) . 2010

机译：多核GPU上基于高性能比较的排序算法
5. Toward Performance Portability for CPUs and GPUs through Algorithmic Compositions [D] . Chang, Li-Wen. 2017

机译：通过算法组合实现CPU和GPU的性能可移植性
6. Graphics Processing Unit (GPU) implementation of image processing algorithms to improve system performance of the Control Acquisition Processing and Image Display System (CAPIDS) of the Micro-Angiographic Fluoroscope (MAF) [O] . S.N. Swetadri Vasan, Ciprian N. Ionita, A.H. Titus, -1

机译：图形处理单元（GpU）执行的图像处理算法以改善控制采集处理的系统的性能以及微造影荧光镜的图像显示系统（CapIDs）（maF）
7. High Performance Comparison-Based Sorting Algorithm on Many-Core GPUs [O] . Xiaochun Ye, Dongrui Fan, Wei Lin, 2010

机译：基于高性能比较的多核GPU排序算法

High performance comparison-based sorting algorithm on many-core GPUs

摘要

著录项

相似文献

相关主题

期刊订阅