首页> 外文会议>IEEE International Symposium on Parallel Distributed Processing >High performance comparison-based sorting algorithm on many-core GPUs
【24h】

High performance comparison-based sorting algorithm on many-core GPUs

机译:基于高性能比较的许多核心GPU分类算法

获取原文

摘要

Sorting is a kernel algorithm for a wide range of applications. In this paper, we present a new algorithm, GPU-Warpsort, to perform comparison-based parallel sort on Graphics Processing Units (GPUs). It mainly consists of a bitonic sort followed by a merge sort. Our algorithm achieves high performance by efficiently mapping the sorting tasks to GPU architectures. Firstly, we take advantage of the synchronous execution of threads in a warp to eliminate the barriers in bitonic sorting network. We also provide sufficient homogeneous parallel operations for all the threads within a warp to avoid branch divergence. Furthermore, we implement the merge sort efficiently by assigning each warp independent pairs of sequences to be merged and by exploiting totally coalesced global memory accesses to eliminate the bandwidth bottleneck. Our experimental results indicate that GPU-Warpsort works well on different kinds of input distributions, and it achieves up to 30% higher performance than previous optimized comparison-based GPU sorting algorithm on input sequences with millions of elements.
机译:排序是用于各种应用程序的内核算法。在本文中,我们介绍了一种新的算法GPU-Warpsort,以在图形处理单元(GPU)上执行基于比较的并行排序。它主要由Bitonic排序后跟合并排序组成。我们的算法通过有效地将排序任务映射到GPU架构来实现高性能。首先,我们利用扭曲中的线程的同步执行来消除点声分类网络中的障碍。我们还为经纱内的所有线程提供足够的同质并行操作,以避免分支发散。此外,我们通过分配要合并的每个扭曲独立的序列和通过利用完全合并的全局存储器访问来实现合并分类,以消除带宽瓶颈。我们的实验结果表明,GPU-WARPSORT在不同类型的输入分布上运行良好,而且它的性能高于先前优化的基于比较的GPU分类算法,与数百万个元素的输入序列上的输入序列高达30%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号