【24h】

GPU sample sort

机译:GPU样本排序

获取原文

摘要

In this paper, we present the design of a sample sort algorithm for manycore GPUs. Despite being one of the most efficient comparison-based sorting algorithms for distributed memory architectures its performance on GPUs was previously unknown. For uniformly distributed keys our sample sort is at least 25% and on average 68% faster than the best comparison-based sorting algorithm, GPU Thrust merge sort, and on average more than 2 times faster than GPU quicksort. Moreover, for 64-bit integer keys it is at least 63% and on average 2 times faster than the highly optimized GPU Thrust radix sort that directly manipulates the binary representation of keys. Our implementation is robust to different distributions and entropy levels of keys and scales almost linearly with the input size. These results indicate that multi-way techniques in general and sample sort in particular achieve substantially better performance than two-way merge sort and quicksort.
机译:在本文中,我们介绍了多核GPU的样本分类算法的设计。 尽管是分布式内存架构的基于最有效的基于比较的排序算法之一,但它在GPU上的性能先前未知。 对于统一分布的键,我们的样本排序至少比最佳比较的排序算法,GPU推力合并排序速度至少为25%,平均速度较快,并且平均比GPU Quicksort快2倍以上。 此外,对于64位整数键,它至少比高度优化的GPU推力基拉兹排序速度至少为63%,平均速度快2倍,直接操纵键的二进制表示。 我们的实施对于键的不同分布和熵级别,并且与输入大小几乎线性缩放。 这些结果表明,一般和样品中的多向技术特别达到比双向合并排序和Quicksort的性能大得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号