...
首页> 外文期刊>Journal of supercomputing >BPLG-BMCS: GPU-sorting algorithm using a tuning skeleton library
【24h】

BPLG-BMCS: GPU-sorting algorithm using a tuning skeleton library

机译:BPLG-BMCS:使用调整骨架库的GPU排序算法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In this work, we present an efficient and portable sorting operator for GPUs. Specifically, we propose an algorithmic variant of the bitonic merge sort which reduces the number of processing stages and internal steps, increasing the workload per thread and focusing on a multi-batch execution for multiple problems of a small size. This proposal is well matched to current GPU architectures and we apply different CUDA optimizations to improve performance. For portability, we use a library based on tuning building blocks. Thanks to this parametrization, the library can easily be tuned for different CUDA GPU architectures. Our proposals obtain competitive performance on two recent NVIDIA GPU architectures, providing an improvement of up to 11,794 over CUDPP and up to 6467 over ModernGPU.
机译:在这项工作中,我们提出了一种高效且可移植的GPU排序运算符。具体来说,我们提出了一种二元合并合并的算法变体,该变体减少了处理阶段和内部步骤的数量,增加了每个线程的工作量,并专注于针对多个小问题的多批执行。该建议与当前的GPU架构非常匹配,我们应用了不同的CUDA优化来提高性能。为了实现可移植性,我们使用基于调整构件的库。由于这种参数化,可以轻松地针对不同的CUDA GPU架构调整该库。我们的建议在两种最新的NVIDIA GPU架构上均具有竞争性性能,与CUDPP相比,最高可提高11794,而与ModernGPU相比,最高可提高6467。

著录项

  • 来源
    《Journal of supercomputing》 |2017年第1期|4-16|共13页
  • 作者单位

    Univ A Coruna, GAC, Dept Elect & Sistemas, Fac Informat, Campus Coruna, La Coruna 15071, Spain;

    Univ A Coruna, GAC, Dept Elect & Sistemas, Fac Informat, Campus Coruna, La Coruna 15071, Spain;

    Univ A Coruna, GAC, Dept Elect & Sistemas, Fac Informat, Campus Coruna, La Coruna 15071, Spain;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    GPU; CUDA; Tuning; Building blocks; Bitonic merge sort;

    机译:GPU;CUDA;调优;构件块;Bitonic合并排序;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号