A radix sorting parallel algorithm suitable for graphic processing unit computing

Xiao Shi-yang; Li Cai-lin; Guo Bao-yun; Xiao Han

首页> 外文期刊>Concurrency and computation: practice and experience >A radix sorting parallel algorithm suitable for graphic processing unit computing

【24h】

A radix sorting parallel algorithm suitable for graphic processing unit computing

机译：一种基数分类并行算法，适用于图形处理单元计算

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Radix sorting is an essential basic data processing operation in many computer fields. It has important practical significance to accelerate its performance through Graphic Processing Unit (GPU). The heterogeneous parallel computing technology attracts much attention and is widely applied for its effective computation efficiency and parallel real-time data processing capability. Taking advantage of the parallelism of GPU in numerical computation processing, a parallelization design method of the Binary_Least Significant Digit (LSD) first Radix Sorting (B_LSD_RS) algorithm based on Open Computing Language (OpenCL) is proposed. The radix sorting algorithm is divided into multiple kernel tasks, and the kernels are sequentially controlled by the event information transfer. The parallel algorithm is implemented and verified on the GPU + CPU heterogeneous platform. The experimental results show that compared with the performance of the B_LSD_RS sequential algorithm based on AMD Ryzen5 1600X CPU, B_LSD_RS parallel algorithm based on Open Multi-Processing (OpenMP) and B_LSD_RS parallel algorithm based on Compute Unified Device Architecture (CUDA), the B_LSD_RS parallel algorithm based on OpenCL obtained 28.86 times, 11.01 times and 2.14 times speedup in the NVIDIA GTX 1070 computing platform respectively, not only achieves high performance but also achieves performance portability among different GPU computing platforms.

机译：RADIX排序是许多计算机字段中必不可少的基本数据处理操作。通过图形处理单元（GPU）加速其性能具有重要的实际意义。异构并行计算技术吸引了很多关注，并且广泛应用于其有效的计算效率和并行实时数据处理能力。利用在数值计算处理中的GPU的并行性，提出了一种基于开放计算语言（OpenCL）的Binary_Least显着数字（LSD）第一基数（B_LSD_RS）算法的并行化设计方法。基数排序算法被分成多个内核任务，并且通过事件信息传输顺序控制内核。在GPU + CPU异构平台上实现并验证了并行算法。实验结果表明，与基于AMD Ryzen5 1600x CPU的B_LSD_RS顺序算法的性能相比，基于开放多处理（OpenMP）和B_LSD_RS并行算法的B_LSD_RS并行算法（CUDA），B_LSD_RS并行基于OpenCL的算法获得了28.86次，11.01次和2.14倍的NVIDIA GTX 1070计算平台加速，不仅实现了高性能，而且还实现了不同GPU计算平台之间的性能便携性。

著录项

来源
《Concurrency and computation: practice and experience》 |2021年第6期|e5818.1-e5818.15|共15页
作者
Xiao Shi-yang; Li Cai-lin; Guo Bao-yun; Xiao Han;
展开▼
作者单位

Northeast Forestry Univ Sch Civil Engn Harbin Peoples R China;

Shandong Univ Technol Sch Civil & Architectural Engn Zibo Shandong Peoples R China;

Shandong Univ Technol Sch Civil & Architectural Engn Zibo Shandong Peoples R China;

Zhengzhou Normal Univ Sch Informat Sci & Technol Zhengzhou Henan Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
dataset; graphic processing unit (GPU); heterogeneous platform; open computing language (OpenCL); parallel algorithm; radix sorting;

机译：数据集;图形处理单元（GPU）;异构平台;开放计算语言（OpenCL）;并行算法;基数排序;

相似文献

外文文献
中文文献
专利

1. GPUDePiCt: A Parallel Implementation of a Clustering Algorithm for Computing Degenerate Primers on Graphics Processing Units [J] . Cickovski Trevor, Flor Tiffany, Irving-Sachs Galen, Computational Biology and Bioinformatics, IEEE/ACM Transactions on . 2015,第2期

机译：GPUDePiCt：用于在图形处理单元上计算简并引物的聚类算法的并行实现
2. Parallelizing flow-accumulation calculations on graphics processing units-From iterative DEM preprocessing algorithm to recursive multiple-flow-direction algorithm [J] . Cheng-Zhi Qin, Lijun Zhan Computers & geosciences . 2012,第期

机译：图形处理单元上并行的流量累积计算-从迭代DEM预处理算法到递归多流向算法
3. Parallel Implementation of Membrane Computing-Inspired Clustering Algorithm on Graphics Processing Unit [J] . Hong Peng, Jie Jin, Jun Wang Journal of computational and theoretical nanoscience . 2016,第6期

机译：图形处理单元上膜计算启发聚类算法的平行实现
4. Parallel Radix Sort on the AMD Fusion Accelerated Processing Unit [C] . Delorme Michael C., Abdelrahman Tarek S., Zhao Chengyan International Conference on Parallel Processing . 2013

机译：AMD Fusion加速处理单元上的并行基数排序
5. Parallel Algorithms and Dynamic Data Structures on the Graphics Processing Unit: a Warp-Centric Approach [D] . Ashkiani, Saman. 2017

机译：图形处理单元上的并行算法和动态数据结构：以翘曲为中心的方法
6. Accelerating the Gillespie Exact Stochastic Simulation Algorithm Using Hybrid Parallel Execution on Graphics Processing Units [O] . Ivan Komarov, Roshan M. DSouza -1

机译：加快吉莱斯皮精确随机模拟算法采用混合并行执行图形处理单元
7. Algorithm for automatic loop parallelization for graphics processing units [O] . A.Yu. Doroshenko, I.Z. Achour 2018

机译：图形处理单元自动循环并行化算法

A radix sorting parallel algorithm suitable for graphic processing unit computing

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅