首页> 外文期刊>Concurrency and computation: practice and experience >GPU implementation of a parallel two-list algorithm for the subset-sum problem
【24h】

GPU implementation of a parallel two-list algorithm for the subset-sum problem

机译:针对子集和问题的并行两列表算法的GPU实现

获取原文
获取原文并翻译 | 示例

摘要

The subset-sum problem is a well-known non-deterministic polynomial-time complete (NP-complete)rndecision problem. This paper proposes a novel and efficient implementation of a parallel two-list algorithmrnfor solving the problem on a graphics processing unit (GPU) using Compute Unified Device Architecturern(CUDA). The algorithm is composed of a generation stage, a pruning stage, and a search stage. It isrnnot easy to effectively implement the three stages of the algorithm on a GPU. Ways to achieve betterrnperformance, reasonable task distribution between CPU and GPU, effective GPU memory management,rnand CPU–GPU communication cost minimization are discussed. The generation stage of the algorithmrnadopts a typical recursive divide-and-conquer strategy. Because recursion cannot be well supported by currentrnGPUs with compute capability less than 3.5, a new vector-based iterative implementation mechanismrnis designed to replace the explicit recursion. Furthermore, to optimize the performance of the GPU implementation,rnthis paper improves the three stages of the algorithm. The experimental results show that the GPUrnimplementation has much better performance than the CPU implementation and can achieve high speeduprnon different GPU cards. The experimental results also illustrate that the improved algorithm can bringrnsignificant performance benefits for the GPU implementation.
机译:子集和问题是众所周知的非确定性多项式时间完全(NP-complete)决策问题。本文提出了一种新颖的并行两列表算法的实现,该算法使用Compute Unified Device Architecture(CUDA)解决图形处理单元(GPU)上的问题。该算法由生成阶段,修剪阶段和搜索阶段组成。在GPU上有效地实现算法的三个阶段并不容易。讨论了实现更好的性能,在CPU和GPU之间合理分配任务,有效的GPU内存管理以及将CPU-GPU通信成本最小化的方法。该算法的生成阶段采用典型的递归分治策略。由于递归不能被当前计算能力低于3.5的GPU很好地支持,因此设计了一种新的基于向量的迭代实现机制来替代显式递归。此外,为了优化GPU实现的性能,本文改进了算法的三个阶段。实验结果表明,GPU实现比CPU实现具有更好的性能,并且可以在不同GPU卡之间实现较高的速度。实验结果还表明,改进后的算法可以为GPU实现带来巨大的性能优势。

著录项

  • 来源
  • 作者单位

    College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China;

    College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China National Supercomputing Center in Changsha, Changsha, Hunan 410082, China;

    College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China;

    College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China National Supercomputing Center in Changsha, Changsha, Hunan 410082, China Department of Computer Science, State University of New York, New Paltz, New York 12561, USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    CUDA; GPU implementation; knapsack problem; parallel two-list algorithm; subset-sumrnproblem;

    机译:CUDA;GPU实施;背包问题;并行两列表算法;子集和问题;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号