CUDA-quicksort: an improved GPU-based implementation of quicksort

Manca Emanuele; Manconi Andrea; Orro Alessandro; Armano Giuliano; Milanesi Luciano

首页> 外文期刊>Concurrency and computation: practice and experience >CUDA-quicksort: an improved GPU-based implementation of quicksort

【24h】

CUDA-quicksort: an improved GPU-based implementation of quicksort

机译：CUDA-quicksort：改进的基于GPU的quicksort实现

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Sorting is a very important task in computer science and becomes a critical operation for programs making heavy use of sorting algorithms. General-purpose computing has been successfully used on Graphics Processing Units (GPUs) to parallelize some sorting algorithms. Two GPU-based implementations of the quicksort were presented in literature: the GPU-quicksort, a compute-unified device architecture (CUDA) iterative implementation, and the CUDA dynamic parallel (CDP) quicksort, a recursive implementation provided by NVIDIA Corporation. We propose CUDA-quicksort an iterative GPU-based implementation of the sorting algorithm. CUDA-quicksort has been designed starting from GPU-quicksort. Unlike GPU-quicksort, it uses atomic primitives to perform inter-block communications while ensuring an optimized access to the GPU memory. Experiments performed on six sorting benchmark distributions show that CUDA-quicksort is up to four times faster than GPU-quicksort and up to three times faster than CDP-quicksort. An in-depth analysis of the performance between CUDA-quicksort and GPU-quicksort shows that the main improvement is related to the optimized GPU memory access rather than to the use of atomic primitives. Moreover, in order to assess the advantages of using the CUDA dynamic parallelism, we implemented a recursive version of the CUDA-quicksort. Experimental results show that CUDA-quicksort is faster than the CDP-quicksort provided by NVIDIA, with better performance achieved using the iterative implementation. Copyright © 2015 John Wiley & Sons, Ltd.

机译：排序是计算机科学中非常重要的任务，对于大量使用排序算法的程序而言，排序成为一项至关重要的操作。通用计算已成功用于图形处理单元（GPU）上，以并行化某些排序算法。文献中介绍了两种基于GPU的快速排序实现：GPU快速排序（一种计算统一的设备体系结构（CUDA）迭代实现）和CUDA动态并行（CDP）快速排序（一种由NVIDIA Corporation提供的递归实现）。我们提出CUDA-quicksort，它是基于迭代GPU的排序算法实现。 CUDA-quicksort是从GPU-quicksort开始设计的。与GPU快速排序不同，它使用原子原语执行块间通信，同时确保对GPU内存的优化访问。对六个排序基准分布进行的实验表明，CUDA-quicksort比GPU-quicksort快四倍，比CDP-quicksort快三倍。对CUDA-quicksort和GPU-quicksort之间的性能进行的深入分析表明，主要的改进与优化的GPU内存访问有关，而不是与原子基元的使用有关。此外，为了评估使用CUDA动态并行性的优势，我们实现了CUDA-quicksort的递归版本。实验结果表明，CUDA-quicksort比NVIDIA提供的CDP-quicksort更快，并且使用迭代实现可实现更好的性能。版权所有©2015 John Wiley＆Sons，Ltd.

著录项

来源
《Concurrency and computation: practice and experience》 |2016年第1期|21-43|共23页
作者
Manca Emanuele; Manconi Andrea; Orro Alessandro; Armano Giuliano; Milanesi Luciano;
展开▼
作者单位

The University of Cagliari Department of Electrical and Electronic Engineering Cagliari Italy;

National Research Council Institute for Biomedical Technologies Segrate (MI) Italy;

National Research Council Institute for Biomedical Technologies Segrate (MI) Italy;

The University of Cagliari Department of Electrical and Electronic Engineering Cagliari Italy;

National Research Council Institute for Biomedical Technologies Segrate (MI) Italy;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
high performance computing; GPU; CUDA; quick sort;

机译：高性能计算;GPU;CUDA;快速排序;

相似文献

外文文献
中文文献
专利

1. Fast Quicksort Implementation Using AVX Instructions [J] . Shay Gueron, Vlad Krasnov The Computer journal . 2016,第1期

机译：使用AVX指令快速实施Quicksort
2. A Faster Implementation Of Quicksort [J] . MANSOOR AL-AALI WSEAS Transactions on Information Science and Applications . 2007,第1期

机译：快速排序的快速实现
3. Improving Multikey Quicksort For Sorting Strings With Many Equal Elements [J] . Eunsang Kim, Kunsoo Park Information Processing Letters . 2009,第9期

机译：改进用于对具有多个相等元素的字符串进行排序的多键Quicksort
4. A parallel implementation of hybridized merge-quicksort algorithm on MPICH [C] . Trimananda Rahmadi, Haryanto Christoforus Yoga 2010 International Conference on Distributed Frameworks for Multimedia Applications . 2010

机译：混合合并快速排序算法在MPICH上的并行实现
5. GPU-based acceleration techniques: Algorithms, implementations, and applications [D] . Xiao, Kai 2015

机译：基于GPU的加速技术：算法，实现和应用程序
6. Mobile GPU-based implementation of automatic analysis method for long-term ECG [O] . Xiaomao Fan, Qihang Yao, Ye Li, 2018

机译：基于移动GPU的长期心电图自动分析方法的实现
7. A Simple, Fast Parallel Implementation of Quicksort and its Performance Evaluation on SUN Enterprise 10000 [O] . Philippas Tsigas, Yi Zhang 2007

机译：sUN企业简化快速并行实现Quicksort及其性能评估

CUDA-quicksort: an improved GPU-based implementation of quicksort

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅