Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious SIMD Sort

机译：快速排序CPU和GPU：带宽忘记SIMD排序的情况

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sort is a fundamental kernel used in many database operations. In-memory sorts are now feasible; sort performance is limited by compute flops and main memory bandwidth rather than I/O. In this paper, we present a competitive analysis of comparison and non-comparison based sorting algorithms on two modern architectures - the latest CPU and GPU architectures. We propose novel CPU radix sort and GPU merge sort implementations which are 2X faster than previously published results. We perform a fair comparison of the algorithms using these best performing implementations on both architectures. While radix sort is faster on current architectures, the gap narrows from CPU to GPU architectures. Merge sort performs better than radix sort for sorting keys of large sizes - such keys will be required to accommodate the increasing cardinality of future databases. We present analytical models for analyzing the performance of our implementations in terms of architectural features such as core count, SIMD and bandwidth. Our obtained performance results are successfully predicted by our models. Our analysis points to merge sort winning over radix sort on future architectures due to its efficient utilization of SIMD and low bandwidth utilization. We simulate a 64-core platform with varying SIMD widths under constant bandwidth per core constraints, and show that large data sizes of 2~(40) (one trillion records), merge sort performance on large key sizes is up to 3X better than radix sort for large SIMD widths on future architectures. Therefore, merge sort should be the sorting method of choice for future databases.

机译：排序是许多数据库操作中使用的基本内核。内存中的类型现在是可行的;排序性能受Compute FLOPS和主内存带宽而不是I / O的限制。在本文中，我们对两个现代架构的比较和非比较算法进行了竞争分析 - 最新的CPU和GPU架构。我们提出了新的CPU基数和GPU合并排序实现，其比以前发布的结果快2倍。我们使用这些架构上的这些最佳执行实现对算法进行了公平的比较。虽然当前架构上的Radix排序更快，但间隙从CPU缩小到GPU架构。 Merge Sort比RADIX排序更好地进行大尺寸的分拣键 - 将需要这种键来适应未来数据库的增加的基数。我们提出了分析模型，用于分析我们在核心计数，SIMD和带宽等架构特征方面进行实现的性能。我们所获得的绩效结果由我们的模型成功预测。由于其高效利用SIMD和低带宽利用率，我们的分析点分类为未来架构的赢取排序。我们模拟了一个64核平台，在每个核心约束的恒定带宽下变化的SIMD宽度，并显示了2〜（40）（一万亿录）的大数据尺寸，比基数更好地合并排序性能高达3倍。对未来体系结构的大型SIMD宽度进行排序。因此，合并排序应该是未来数据库选择的排序方法。

著录项

来源
《ACM SIGMOD international conference on management of data》|2010年||共12页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词
performance; algorithms;

机译：表现;算法;

相似文献

外文文献
中文文献
专利

1. SIML: A Fast SIMD Algorithm for Calculating LINGO Chemical Similarities on GPUs and CPUs [J] . Haque IS, Pande VS, Walters WP Journal of chemical information and modeling . 2010,第4期

机译：SIML：一种快速SIMD算法，用于计算GPU和CPU上的LINGO化学相似性
2. A comparison-free sorting algorithm on CPUs and GPUs [J] . Abdel-hafeez Saleh, Gordon-Ross Ann, Abubaker Samer Journal of supercomputing . 2018,第11期

机译：CPU和GPU上的免比较排序算法
3. Interaction sorting method for molecular dynamics on multi-core SIMD CPU architecture [J] . Matvienko Sergey, Alemasov Nikolay, Fomin Eduard Journal of Bioinformatics and Computational Biology . 2015,第1期

机译：基于多核simd CPU架构的分子动力学交互排序方法
4. Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious SIMD Sort [C] . Nadathur Satish, Changkyu Kim, Jatin Chhugani, ACM SIGMOD international conference on management of data;SIGMOD 2010 . 2010

机译：在CPU和GPU上进行快速排序：带宽忽略型SIMD排序的案例
5. Efficient Viewshed Computation Algorithms on GPUs and CPUs [D] . Qarah, Faisal F. 2020

机译：GPU和CPU上有效的viewShed计算算法
6. SIML: A Fast SIMD Algorithm for Calculating LINGO Chemical Similarities on GPUs and CPUs [O] . Imran S. Haque, Vijay S. Pande, W. Patrick Walters -1

机译：sImL：一种快速sImD算法计算在GpU和CpU LINGO化学相似性
7. SIML: a fast SIMD algorithm for calculating LINGO chemical similarities on GPUs and CPUs [O] . Imran S. Haque, Vijay S. P, W. Patrick Walters 2010

机译：sImL：一种快速sImD算法，用于计算GpU和CpU上的LINGO化学相似性

Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious SIMD Sort

摘要

著录项

相似文献

相关主题

期刊订阅