【24h】

GPU sample sort

机译:GPU样本排序

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we present the design of a sample sort algorithm for manycore GPUs. Despite being one of the most efficient comparison-based sorting algorithms for distributed memory architectures its performance on GPUs was previously unknown. For uniformly distributed keys our sample sort is at least 25% and on average 68% faster than the best comparison-based sorting algorithm, GPU Thrust merge sort, and on average more than 2 times faster than GPU quicksort. Moreover, for 64-bit integer keys it is at least 63% and on average 2 times faster than the highly optimized GPU Thrust radix sort that directly manipulates the binary representation of keys. Our implementation is robust to different distributions and entropy levels of keys and scales almost linearly with the input size. These results indicate that multi-way techniques in general and sample sort in particular achieve substantially better performance than two-way merge sort and quicksort.
机译:在本文中,我们提出了用于多核GPU的样本排序算法的设计。尽管它是分布式内存体系结构中最有效的基于比较的排序算法之一,但其在GPU上的性能以前是未知的。对于均匀分布的键,我们的样本排序比基于最佳比较的最佳排序算法GPU Thrust合并排序至少快25%,平均要快68%,平均比GPU快速排序快2倍以上。此外,对于64位整数键,它比直接操作键的二进制表示的高度优化的GPU Thrust基数排序至少快63%,平均快2倍。我们的实现对键的不同分布和熵级别具有鲁棒性,并且几乎随输入大小线性变化。这些结果表明,与双向合并排序和快速排序相比,常规的多路技术尤其是样本排序实现了更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号