首页> 外文期刊>Journal of Parallel and Distributed Computing >Efficient local search on the GPU-Investigations on the vehicle routing problem
【24h】

Efficient local search on the GPU-Investigations on the vehicle routing problem

机译:在GPU上进行高效的本地搜索-有关车辆路线问题的调查

获取原文
获取原文并翻译 | 示例

摘要

We study how to implement local search efficiently on data parallel accelerators such as Graphics Processing Units. The Distance-constrained Capacitated Vehicle Routing Problem, a computationally very hard discrete optimization problem with high industrial relevance, is the selected vehicle for our investigations. More precisely, we investigate local search with the Best Improving strategy for the 2-opt and 3-opt operators on a giant tour representation. Resource extension functions are used for constant time move evaluation. Using CUDA, a basic implementation called The Benchmark Version has been developed and deployed on a Fermi architecture Graphics Processing Unit. Both neighborhood setup and evaluation are performed entirely on the device. The Benchmark Version is the initial step of an incremental improvement process where a number of important implementation aspects have been systematically studied. Ten well-known test instances from the literature are used in computational experiments, and profiling tools are used to identify bottlenecks. In the final version, the device is fully saturated, given a large enough problem instance. A speedup of almost an order of magnitude relative to The Benchmark Version is observed. We conclude that, with some effort, local search may be implemented very efficiently on Graphics Processing Units. Our experiments show that a maximum efficiency, however, requires a neighborhood cardinality of at least one million. Full exploration of a billion neighbors takes a few seconds and may be deemed too expensive with the current technology. Reduced neighborhoods through filtering is an obvious remedy. Experiments on simple models of neighborhood filtering indicate, however, that the speedup effect is limited on data parallel accelerators. We believe these insights are valuable in the design of new metaheuristics that fully utilize modern, heterogeneous processors.
机译:我们研究如何在数据并行加速器(例如图形处理单元)上有效地实现本地搜索。距离受限的容量车辆路径问题是一个具有很高的工业相关性的计算上非常困难的离散优化问题,是我们研究中选择的车辆。更准确地说,我们采用最佳改进策略研究本地搜索,针对大型旅行社上的2选和3选运营商。资源扩展功能用于恒定时间移动评估。使用CUDA,已经开发了称为基准版本的基本实现,并将其部署在Fermi架构的图形处理单元上。邻域设置和评估都完全在设备上执行。基准版本是逐步改进过程的第一步,在此过程中,已经对许多重要的实施方面进行了系统地研究。在计算实验中使用了来自文献的十个著名的测试实例,并使用分析工具来识别瓶颈。在最终版本中,给定足够大的问题实例,设备会完全饱和。相对于基准版本,观察到的加速几乎是一个数量级。我们得出的结论是,通过一些努力,可以在图形处理单元上非常有效地实施本地搜索。我们的实验表明,要获得最高效率,就需要至少一百万的邻域基数。对十亿个邻居进行全面探索需要花费几秒钟的时间,并且使用当前技术可能被认为过于昂贵。通过过滤减少邻居数量是一个明显的补救办法。但是,对邻域过滤的简单模型进行的实验表明,加速效果在数据并行加速器上受到限制。我们相信这些见解对于设计充分利用现代异构处理器的新元启发式技术非常有价值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号