首页> 外文会议>2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) >High performance comparison-based sorting algorithm on many-core GPUs
【24h】

High performance comparison-based sorting algorithm on many-core GPUs

机译:多核GPU上基于高性能比较的排序算法

获取原文
获取原文并翻译 | 示例

摘要

Sorting is a kernel algorithm for a wide range of applications. In this paper, we present a new algorithm, GPU-Warpsort, to perform comparison-based parallel sort on Graphics Processing Units (GPUs). It mainly consists of a bitonic sort followed by a merge sort. Our algorithm achieves high performance by efficiently mapping the sorting tasks to GPU architectures. Firstly, we take advantage of the synchronous execution of threads in a warp to eliminate the barriers in bitonic sorting network. We also provide sufficient homogeneous parallel operations for all the threads within a warp to avoid branch divergence. Furthermore, we implement the merge sort efficiently by assigning each warp independent pairs of sequences to be merged and by exploiting totally coalesced global memory accesses to eliminate the bandwidth bottleneck. Our experimental results indicate that GPU-Warpsort works well on different kinds of input distributions, and it achieves up to 30% higher performance than previous optimized comparison-based GPU sorting algorithm on input sequences with millions of elements.
机译:排序是适用于广泛应用程序的内核算法。在本文中,我们提出了一种新的算法GPU-Warpsort,可以对图形处理单元(GPU)执行基于比较的并行排序。它主要由双音排序后跟合并排序组成。我们的算法通过有效地将排序任务映射到GPU架构来实现高性能。首先,我们利用经纱中线程的同步执行来消除双音分类网络中的障碍。我们还为扭曲内的所有线程提供了足够的同类并行操作,以避免分支发散。此外,我们通过分配要合并的每个扭曲独立序列对并利用完全合并的全局内存访问来消除带宽瓶颈,从而有效地实现了合并排序。我们的实验结果表明,GPU-Warpsort在不同类型的输入分布上均能很好地工作,并且比以前优化的基于比较的GPU排序算法对具有数百万个元素的输入序列的性能提高了30%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号