GPU implementation of a parallel two-list algorithm for the subset-sum problem

Lanjun Wan; Kenli Li; Jing Liu; Keqin Li

首页> 外文期刊>Concurrency and computation: practice and experience >GPU implementation of a parallel two-list algorithm for the subset-sum problem

【24h】

GPU implementation of a parallel two-list algorithm for the subset-sum problem

机译：针对子集和问题的并行两列表算法的GPU实现

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The subset-sum problem is a well-known non-deterministic polynomial-time complete (NP-complete)rndecision problem. This paper proposes a novel and efficient implementation of a parallel two-list algorithmrnfor solving the problem on a graphics processing unit (GPU) using Compute Unified Device Architecturern(CUDA). The algorithm is composed of a generation stage, a pruning stage, and a search stage. It isrnnot easy to effectively implement the three stages of the algorithm on a GPU. Ways to achieve betterrnperformance, reasonable task distribution between CPU and GPU, effective GPU memory management,rnand CPU–GPU communication cost minimization are discussed. The generation stage of the algorithmrnadopts a typical recursive divide-and-conquer strategy. Because recursion cannot be well supported by currentrnGPUs with compute capability less than 3.5, a new vector-based iterative implementation mechanismrnis designed to replace the explicit recursion. Furthermore, to optimize the performance of the GPU implementation,rnthis paper improves the three stages of the algorithm. The experimental results show that the GPUrnimplementation has much better performance than the CPU implementation and can achieve high speeduprnon different GPU cards. The experimental results also illustrate that the improved algorithm can bringrnsignificant performance benefits for the GPU implementation.

机译：子集和问题是众所周知的非确定性多项式时间完全（NP-complete）决策问题。本文提出了一种新颖的并行两列表算法的实现，该算法使用Compute Unified Device Architecture（CUDA）解决图形处理单元（GPU）上的问题。该算法由生成阶段，修剪阶段和搜索阶段组成。在GPU上有效地实现算法的三个阶段并不容易。讨论了实现更好的性能，在CPU和GPU之间合理分配任务，有效的GPU内存管理以及将CPU-GPU通信成本最小化的方法。该算法的生成阶段采用典型的递归分治策略。由于递归不能被当前计算能力低于3.5的GPU很好地支持，因此设计了一种新的基于向量的迭代实现机制来替代显式递归。此外，为了优化GPU实现的性能，本文改进了算法的三个阶段。实验结果表明，GPU实现比CPU实现具有更好的性能，并且可以在不同GPU卡之间实现较高的速度。实验结果还表明，改进后的算法可以为GPU实现带来巨大的性能优势。

著录项

来源
《Concurrency and computation: practice and experience》 |2015年第1期|119–145|共1页
作者
Lanjun Wan; Kenli Li; Jing Liu; Keqin Li;
展开▼
作者单位

College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China;

College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China National Supercomputing Center in Changsha, Changsha, Hunan 410082, China;

College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China;

College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China National Supercomputing Center in Changsha, Changsha, Hunan 410082, China Department of Computer Science, State University of New York, New Paltz, New York 12561, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
CUDA; GPU implementation; knapsack problem; parallel two-list algorithm; subset-sumrnproblem;

机译：CUDA;GPU实施;背包问题;并行两列表算法;子集和问题;

相似文献

外文文献
中文文献
专利

1. A novel cooperative accelerated parallel two-list algorithm for solving the subset-sum problem on a hybrid CPU-GPU cluster [J] . Lanjun Wan, Kenli Li, Keqin Li Journal of Parallel and Distributed Computing . 2016,第NOVa期

机译：解决混合CPU-GPU集群子和问题的新型协同加速并行二列表算法
2. An optimal and scalable parallelization of the two-list algorithm for the subset-sum problem [J] . Sanches CAA, Soma NY, Yanasse HH European Journal of Operational Research . 2007,第1期

机译：子集和问题的两列表算法的最佳可扩展并行化
3. From tile algorithm to stripe algorithm: a CUBLAS-based parallel implementation on GPUs of Gauss method for the resolution of extremely large dense linear systems stored on an array of solid state devices [J] . Manuel Carcenac Journal of supercomputing . 2014,第1期

机译：从图块算法到条带算法：高斯方法的GPU上基于CUBLAS的并行实现，用于解析存储在固态设备阵列上的超大型密集线性系统
4. Efficient Parallelization of a Two-List Algorithm for the Subset-Sum Problem on a Hybrid CPU/GPU Cluster [C] . Kang Letian, Wan Lanjun, Li Kenli International Symposium on Parallel Architectures, Algorithms, and Programming . 2014

机译：混合CPU / GPU群集上子集和问题的两列表算法的高效并行化
5. Parallelization of Genetic Algorithm to Solve MAX-3SAT Problem on GPUs [D] . Shivram, Prakruthi. 2019

机译：遗传算法解决GPU上最大3SAT问题的遗传算法
6. A sample implementation for parallelizing Divide-and-Conquer algorithms on the GPU [O] . Gang Mei, Jiayin Zhang, Nengxiong Xu, 2018

机译：在GPU上并行化分而治之算法的示例实现
7. PRAND: GPU accelerated parallel random number generation library: Using most reliable algorithms and applying parallelism of modern GPUs and CPUs [O] . Barash, L. Yu., Shchur, L. N. 2014

机译：pRaND：GpU加速并行随机数生成库：使用最可靠的算法，并应用现代GpU和CpU的并行性

GPU implementation of a parallel two-list algorithm for the subset-sum problem

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅