首页> 外文会议>ACM SIGMOD international conference on management of data;SIGMOD 2010 >Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious SIMD Sort
【24h】

Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious SIMD Sort

机译:在CPU和GPU上进行快速排序:带宽忽略型SIMD排序的案例

获取原文

摘要

Sort is a fundamental kernel used in many database operations. In-memory sorts are now feasible; sort performance is limited by compute flops and main memory bandwidth rather than I/O. In this paper, we present a competitive analysis of comparison and non-comparison based sorting algorithms on two modern architectures - the latest CPU and GPU architectures. We propose novel CPU radix sort and GPU merge sort implementations which are 2X faster than previously published results. We perform a fair comparison of the algorithms using these best performing implementations on both architectures. While radix sort is faster on current architectures, the gap narrows from CPU to GPU architectures. Merge sort performs better than radix sort for sorting keys of large sizes - such keys will be required to accommodate the increasing cardinality of future databases. We present analytical models for analyzing the performance of our implementations in terms of architectural features such as core count, SIMD and bandwidth. Our obtained performance results are successfully predicted by our models. Our analysis points to merge sort winning over radix sort on future architectures due to its efficient utilization of SIMD and low bandwidth utilization. We simulate a 64-core platform with varying SIMD widths under constant bandwidth per core constraints, and show that large data sizes of 2~(40) (one trillion records), merge sort performance on large key sizes is up to 3X better than radix sort for large SIMD widths on future architectures. Therefore, merge sort should be the sorting method of choice for future databases.
机译:排序是许多数据库操作中使用的基本内核。内存中排序现在是可行的。排序性能受计算触发器和主内存带宽而不是I / O的限制。在本文中,我们对两种现代架构(最新的CPU和GPU架构)上基于比较和非比较的排序算法进行了竞争性分析。我们提出了新颖的CPU基数排序和GPU合并排序实现,这些实现比以前发布的结果快2倍。我们使用两种体系结构上性能最佳的实施方案对算法进行公平的比较。尽管在当前架构上基数排序更快,但是从CPU到GPU架构的差距在缩小。对于大尺寸的键排序,Merge排序比基数排序要好-需要这种键来适应未来数据库不断增长的基数。我们提供了分析模型,用于根据体系结构特征(例如核心数,SIMD和带宽)来分析实现的性能。我们的模型成功地预测了我们获得的性能结果。我们的分析指出,由于对SIMD的有效利用和低带宽利用率,在未来的体系结构中合并排序胜过基数排序。我们模拟了一个64核心平台,该平台在每个核心约束恒定带宽下具有可变SIMD宽度,并显示2〜(40)个大数据大小(一万亿条记录),大键大小上的合并排序性能比基数高3倍。在将来的体系结构上对较大的SIMD宽度进行排序。因此,合并排序应该是将来数据库选择的排序方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号