首页> 外文会议>ACM SIGMOD international conference on management of data >Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious SIMD Sort
【24h】

Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious SIMD Sort

机译:快速排序CPU和GPU:带宽忘记SIMD排序的情况

获取原文

摘要

Sort is a fundamental kernel used in many database operations. In-memory sorts are now feasible; sort performance is limited by compute flops and main memory bandwidth rather than I/O. In this paper, we present a competitive analysis of comparison and non-comparison based sorting algorithms on two modern architectures - the latest CPU and GPU architectures. We propose novel CPU radix sort and GPU merge sort implementations which are 2X faster than previously published results. We perform a fair comparison of the algorithms using these best performing implementations on both architectures. While radix sort is faster on current architectures, the gap narrows from CPU to GPU architectures. Merge sort performs better than radix sort for sorting keys of large sizes - such keys will be required to accommodate the increasing cardinality of future databases. We present analytical models for analyzing the performance of our implementations in terms of architectural features such as core count, SIMD and bandwidth. Our obtained performance results are successfully predicted by our models. Our analysis points to merge sort winning over radix sort on future architectures due to its efficient utilization of SIMD and low bandwidth utilization. We simulate a 64-core platform with varying SIMD widths under constant bandwidth per core constraints, and show that large data sizes of 2~(40) (one trillion records), merge sort performance on large key sizes is up to 3X better than radix sort for large SIMD widths on future architectures. Therefore, merge sort should be the sorting method of choice for future databases.
机译:排序是许多数据库操作中使用的基本内核。内存中的类型现在是可行的;排序性能受Compute FLOPS和主内存带宽而不是I / O的限制。在本文中,我们对两个现代架构的比较和非比较算法进行了竞争分析 - 最新的CPU和GPU架构。我们提出了新的CPU基数和GPU合并排序实现,其比以前发布的结果快2倍。我们使用这些架构上的这些最佳执行实现对算法进行了公平的比较。虽然当前架构上的Radix排序更快,但间隙从CPU缩小到GPU架构。 Merge Sort比RADIX排序更好地进行大尺寸的分拣键 - 将需要这种键来适应未来数据库的增加的基数。我们提出了分析模型,用于分析我们在核心计数,SIMD和带宽等架构特征方面进行实现的性能。我们所获得的绩效结果由我们的模型成功预测。由于其高效利用SIMD和低带宽利用率,我们的分析点分类为未来架构的赢取排序。我们模拟了一个64核平台,在每个核心约束的恒定带宽下变化的SIMD宽度,并显示了2〜(40)(一万亿录)的大数据尺寸,比基数更好地合并排序性能高达3倍。对未来体系结构的大型SIMD宽度进行排序。因此,合并排序应该是未来数据库选择的排序方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号