首页> 外文期刊>Software >A high-performance sorting algorithm for multicore single-instruction multiple-data processors
【24h】

A high-performance sorting algorithm for multicore single-instruction multiple-data processors

机译:一种用于多核单指令多数据处理器的高性能排序算法

获取原文
获取原文并翻译 | 示例
       

摘要

Many sorting algorithms have been studied in the past, but there are only a few algorithms that can effec-tively exploit both single-instruction multiple-data (SIMD) instructions and thread-level parallelism. In this paper, we propose a new high-performance sorting algorithm, called aligned-access sort (AA-sort), that exploits both the SIMD instructions and thread-level parallelism available on today's multicore processors. Our algorithm consists of two phases, an in-core sorting phase and an out-of-core merging phase. The in-core sorting phase uses our new sorting algorithm that extends combsort to exploit SIMD instructions. The out-of-core algorithm is based on mergesort with our novel vectorized merging algorithm. Both phases can take advantage of SIMD instructions. The key to high performance is eliminating unaligned memory accesses that would reduce the effectiveness of SIMD instructions in both phases. We implemented and evaluated the AA-sort on PowerPC 970MP and Cell Broadband Engine platforms. In summary, a sequential version of the AA-sort using SIMD instructions outperformed IBM's optimized sequential sorting library by 1.8 times and bitonic mergesort using SIMD instructions by 3.3 times on PowerPC 970MP when sorting 32 million random 32-bit integers. Also, a parallel version of AA-sort demonstrated better scalability with increasing numbers of cores than a parallel version of bitonic mergesort on both platforms.
机译:过去已经研究了许多排序算法,但是只有少数几种算法可以有效地利用单指令多数据(SIMD)指令和线程级并行性。在本文中,我们提出了一种新的高性能排序算法,称为对齐访问排序(AA-sort),该算法利用了当今多核处理器上可用的SIMD指令和线程级并行性。我们的算法包括两个阶段,内核内排序阶段和内核外合并阶段。核心排序阶段使用了我们的新排序算法,该算法扩展了梳理以利用SIMD指令。核外算法基于与我们新颖的矢量化合并算法的mergesort。这两个阶段均可利用SIMD指令。高性能的关键是消除不对齐的内存访问,这将降低两个阶段中SIMD指令的有效性。我们在PowerPC 970MP和Cell Broadband Engine平台上实施并评估了AA分类。总之,在PowerPC 970MP上对3200万个32位随机整数进行排序时,使用SIMD指令的AA排序的顺序版本比IBM优化的顺序排序库要高1.8倍,而使用SIMD指令的bitonic mergesort的性能要高3.3倍。同样,与两种平台上的并行版本的Bitonic Mergesort相比,并行版本的AA-sort表现出更好的可扩展性,且内核数量不断增加。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号