【24h】

Super Scalar Sample Sort

机译:超级标量样本排序

获取原文

摘要

Sample sort, a generalization of quicksort that partitions the input into many pieces, is known as the best practical comparison based sorting algorithm for distributed memory parallel computers. We show that sample sort is also useful on a single processor. The main algorithmic insight is that element comparisons can be decoupled from expensive conditional branching using predicated instructions. This transformation facilitates optimizations like loop unrolling and software pipelining. The final implementation, albeit cache efficient, is limited by a linear number of memory accesses rather than the O(nlogn) comparisons. On an Itanium 2 machine, we obtain a speedup of up to 2 over std: : sort from the GCC STL library, which is known as one of the fastest available quicksort implementations.
机译:样本排序,Quicksort的泛化,将输入分为许多件,被称为分布式存储器并行计算机的基于最佳实际比较的分类算法。我们显示样本排序在单个处理器上也很有用。主要算法洞察力是元素比较可以使用预测指令从昂贵的条件分支解耦。该转换促进了循环展开和软件流水的优化。最终实现虽然高速缓存有效,受到线性存储器访问而不是O(NLogn)比较的限制。在Itanium 2机器上,我们在STD中获得最多2的加速::来自GCC STL库的排序,称为可用的Quicksort实现之一。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号