High performance comparison-based sorting algorithm on many-core GPUs

机译：基于高性能比较的许多核心GPU分类算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sorting is a kernel algorithm for a wide range of applications. In this paper, we present a new algorithm, GPU-Warpsort, to perform comparison-based parallel sort on Graphics Processing Units (GPUs). It mainly consists of a bitonic sort followed by a merge sort. Our algorithm achieves high performance by efficiently mapping the sorting tasks to GPU architectures. Firstly, we take advantage of the synchronous execution of threads in a warp to eliminate the barriers in bitonic sorting network. We also provide sufficient homogeneous parallel operations for all the threads within a warp to avoid branch divergence. Furthermore, we implement the merge sort efficiently by assigning each warp independent pairs of sequences to be merged and by exploiting totally coalesced global memory accesses to eliminate the bandwidth bottleneck. Our experimental results indicate that GPU-Warpsort works well on different kinds of input distributions, and it achieves up to 30% higher performance than previous optimized comparison-based GPU sorting algorithm on input sequences with millions of elements.

机译：排序是用于各种应用程序的内核算法。在本文中，我们介绍了一种新的算法GPU-Warpsort，以在图形处理单元（GPU）上执行基于比较的并行排序。它主要由Bitonic排序后跟合并排序组成。我们的算法通过有效地将排序任务映射到GPU架构来实现高性能。首先，我们利用扭曲中的线程的同步执行来消除点声分类网络中的障碍。我们还为经纱内的所有线程提供足够的同质并行操作，以避免分支发散。此外，我们通过分配要合并的每个扭曲独立的序列和通过利用完全合并的全局存储器访问来实现合并分类，以消除带宽瓶颈。我们的实验结果表明，GPU-WARPSORT在不同类型的输入分布上运行良好，而且它的性能高于先前优化的基于比较的GPU分类算法，与数百万个元素的输入序列上的输入序列高达30％。

著录项

来源
《IEEE International Symposium on Parallel Distributed Processing》|2010年||共10页
会议地点
作者
Xiaochun Ye; Dongrui Fan; Wei Lin; Nan Yuan; Ienne P.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.138-53;
关键词
Bitonic Network; CUDA; GPU; Many-Core; Merge Sort; Sorting Algorithm;

机译：Bitonic网络;CUDA;GPU;许多核心;合并排序;排序算法;

相似文献

外文文献
中文文献
专利

1. Parallel Shellsort Algorithm for Many-Core GPUs with CUDA [J] . Chun-Yuan Lin, Wei Sheng Lee, Chuan Yi Tang International journal of grid and high performance computing . 2012,第2期

机译：带有CUDA的多核GPU的并行Shellsort算法
2. Combining high productivity and high performance in image processing using Single Assignment C on multi-core CPUs and many-core GPUs [J] . Volkmar Wieser, Clemens Grelck, Peter Haslinger, Journal of electronic imaging . 2012,第2期

机译：在多核CPU和多核GPU上使用Single Assignment C将图像处理中的高生产率和高性能相结合
3. Comparing performance of many-core CPUs and GPUs for static and motion compensated reconstruction of C-arm CT data. [J] . Hofmann HG, Keck B, Rohkohl C, Medical Physics . 2011,第1期

机译：比较C臂CT数据的静态和运动补偿重建的多核CPU和GPU的性能。
4. High performance comparison-based sorting algorithm on many-core GPUs [C] . Xiaochun Ye, Dongrui Fan, Wei Lin, 2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) . 2010

机译：多核GPU上基于高性能比较的排序算法
5. Toward Performance Portability for CPUs and GPUs through Algorithmic Compositions [D] . Chang, Li-Wen. 2017

机译：通过算法组合实现CPU和GPU的性能可移植性
6. Graphics Processing Unit (GPU) implementation of image processing algorithms to improve system performance of the Control Acquisition Processing and Image Display System (CAPIDS) of the Micro-Angiographic Fluoroscope (MAF) [O] . S.N. Swetadri Vasan, Ciprian N. Ionita, A.H. Titus, -1

机译：图形处理单元（GpU）执行的图像处理算法以改善控制采集处理的系统的性能以及微造影荧光镜的图像显示系统（CapIDs）（maF）
7. High Performance Comparison-Based Sorting Algorithm on Many-Core GPUs [O] . Xiaochun Ye, Dongrui Fan, Wei Lin, 2010

机译：基于高性能比较的多核GPU排序算法

High performance comparison-based sorting algorithm on many-core GPUs

摘要

著录项

相似文献

相关主题

期刊订阅