...
首页> 外文期刊>Concurrency, practice and experience >GPU-accelerated backtracking using CUDA Dynamic Parallelism
【24h】

GPU-accelerated backtracking using CUDA Dynamic Parallelism

机译:使用CUDA动态并行技术加速GPU的回溯

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

NewGPGPUtechnologies, such asCUDADynamicParallelism (CDP), can help dealingwith recursive patterns of computation, such as divide-and-conquer, used by backtracking algorithms. In this paper, we propose a GPU-accelerated backtracking algorithm using CDP that extends a well-known parallel backtracking model. The search starts on CPU, processing the search tree until a first cutoff depth. Based on this partial backtracking tree, the algorithm analyzes the memory requirements of subsequent kernel generations. The proposed algorithm performs no dynamic allocation of memory on GPU, unlike related works from the literature. The proposed algorithm has been extensively tested using the N-Queens Puzzle problem and instances of the Asymmetric Traveling Salesman Problem (ATSP) as test-cases. The proposed CDP algorithmmay, under some conditions, outperform its non-CDP counterpart by a factor up to 25. But, it may also be up to twice slower. The CDP-based implementation has much betterworst case execution times andmakes algorithm's performance less dependent on the tuning of parameters. Compared to otherCDP-based strategies from the literature, the proposed algorithm is on average 8×faster. The proposed algorithm is also hybridized with another CDP-based strategy from the literature. The combination of strategies is in average 4.5× faster than the related strategy. We also identify some difficulties, limitations, and bottlenecks concerning theCDPprogramming modelwhich may be useful for helping potential users.
机译:诸如CUDAD动态并行化(CDP)之类的新GPGPU技术可以帮助处理回溯算法所使用的递归计算模式,例如分治法。在本文中,我们提出了一种使用CDP的GPU加速回溯算法,该算法扩展了众所周知的并行回溯模型。搜索从CPU开始,处理搜索树直到第一个截止深度。基于此部分回溯树,该算法分析了后续内核生成的内存需求。与文献中的相关工作不同,所提出的算法没有在GPU上执行动态内存分配。使用N-Queens难题问题和非对称旅行商问题(ATSP)实例作为测试案例,对该算法进行了广泛的测试。在某些情况下,建议的CDP算法的性能可能比非CDP算法高25倍。但是,它的速度也可能要慢两倍。基于CDP的实现在最坏情况下的执行时间要好得多,并使算法的性能更少地依赖于参数的调整。与文献中其他基于CDP的策略相比,该算法平均要快8倍。所提出的算法还与文献中另一种基于CDP的策略进行了混合。这些策略的组合平均比相关策略快4.5倍。我们还发现了有关CDP编程模型的一些困难,局限性和瓶颈,这可能对帮助潜在用户很有用。

著录项

  • 来源
    《Concurrency, practice and experience》 |2018年第9期|e4374.1-e4374.19|共19页
  • 作者单位

    ParGO Research Group (Parallelism, Graphs, and Optimization), Mestrado Doutorado em Ciência da Computação, Universidade Federal do Ceará, Fortaleza, Brazil;

    Mathematics and Operational Research Department (MARO), University of Mons, Mons, Belgium,INRIA Lille Nord Europe, Université Lille 1, CNRS/CRIStAL, Villeneuve-d'Ascq, France;

    ParGO Research Group (Parallelism, Graphs, and Optimization), Mestrado Doutorado em Ciência da Computação, Universidade Federal do Ceará, Fortaleza, Brazil;

    INRIA Lille Nord Europe, Université Lille 1, CNRS/CRIStAL, Villeneuve-d'Ascq, France;

    Mathematics and Operational Research Department (MARO), University of Mons, Mons, Belgium;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    CUDA dynamic parallelism; depth-first search; GPU computing; parallel backtracking;

    机译:CUDA动态并行性;深度优先搜索;GPU计算;并行回溯;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号