首页> 外文期刊>Concurrency and computation: practice and experience >A radix sorting parallel algorithm suitable for graphic processing unit computing
【24h】

A radix sorting parallel algorithm suitable for graphic processing unit computing

机译:一种基数分类并行算法,适用于图形处理单元计算

获取原文
获取原文并翻译 | 示例
           

摘要

Radix sorting is an essential basic data processing operation in many computer fields. It has important practical significance to accelerate its performance through Graphic Processing Unit (GPU). The heterogeneous parallel computing technology attracts much attention and is widely applied for its effective computation efficiency and parallel real-time data processing capability. Taking advantage of the parallelism of GPU in numerical computation processing, a parallelization design method of the Binary_Least Significant Digit (LSD) first Radix Sorting (B_LSD_RS) algorithm based on Open Computing Language (OpenCL) is proposed. The radix sorting algorithm is divided into multiple kernel tasks, and the kernels are sequentially controlled by the event information transfer. The parallel algorithm is implemented and verified on the GPU + CPU heterogeneous platform. The experimental results show that compared with the performance of the B_LSD_RS sequential algorithm based on AMD Ryzen5 1600X CPU, B_LSD_RS parallel algorithm based on Open Multi-Processing (OpenMP) and B_LSD_RS parallel algorithm based on Compute Unified Device Architecture (CUDA), the B_LSD_RS parallel algorithm based on OpenCL obtained 28.86 times, 11.01 times and 2.14 times speedup in the NVIDIA GTX 1070 computing platform respectively, not only achieves high performance but also achieves performance portability among different GPU computing platforms.
机译:RADIX排序是许多计算机字段中必不可少的基本数据处理操作。通过图形处理单元(GPU)加速其性能具有重要的实际意义。异构并行计算技术吸引了很多关注,并且广泛应用于其有效的计算效率和并行实时数据处理能力。利用在数值计算处理中的GPU的并行性,提出了一种基于开放计算语言(OpenCL)的Binary_Least显着数字(LSD)第一基数(B_LSD_RS)算法的并行化设计方法。基数排序算法被分成多个内核任务,并且通过事件信息传输顺序控制内核。在GPU + CPU异构平台上实现并验证了并行算法。实验结果表明,与基于AMD Ryzen5 1600x CPU的B_LSD_RS顺序算法的性能相比,基于开放多处理(OpenMP)和B_LSD_RS并行算法的B_LSD_RS并行算法(CUDA),B_LSD_RS并行基于OpenCL的算法获得了28.86次,11.01次和2.14倍的NVIDIA GTX 1070计算平台加速,不仅实现了高性能,而且还实现了不同GPU计算平台之间的性能便携性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号