首页> 美国卫生研究院文献>other >Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines
【2h】

Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines

机译:Hybrid CPU-GPU机器上有效的不规则波前传播算法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We address the problem of efficient execution of a computation pattern, referred to here as the irregular wavefront propagation pattern (IWPP), on hybrid systems with multiple CPUs and GPUs. The IWPP is common in several image processing operations. In the IWPP, data elements in the wavefront propagate waves to their neighboring elements on a grid if a propagation condition is satisfied. Elements receiving the propagated waves become part of the wavefront. This pattern results in irregular data accesses and computations. We develop and evaluate strategies for efficient computation and propagation of wavefronts using a multi-level queue structure. This queue structure improves the utilization of fast memories in a GPU and reduces synchronization overheads. We also develop a tile-based parallelization strategy to support execution on multiple CPUs and GPUs. We evaluate our approaches on a state-of-the-art GPU accelerated machine (equipped with 3 GPUs and 2 multicore CPUs) using the IWPP implementations of two widely used image processing operations: morphological reconstruction and euclidean distance transform. Our results show significant performance improvements on GPUs. The use of multiple CPUs and GPUs cooperatively attains speedups of 50× and 85× with respect to single core CPU executions for morphological reconstruction and euclidean distance transform, respectively.
机译:我们解决了在具有多个CPU和GPU的混合系统上有效执行计算模式(在此称为不规则波前传播模式(IWPP))的问题。 IWPP在几种图像处理操作中很常见。在IWPP中,如果满足传播条件,则波前中的数据元素会将波传播到网格上的相邻元素。接收传播波的元素成为波前的一部分。这种模式导致不规则的数据访问和计算。我们开发和评估使用多级队列结构进行波前有效计算和传播的策略。这种队列结构提高了GPU中快速内存的利用率,并减少了同步开销。我们还开发了基于图块的并行化策略,以支持在多个CPU和GPU上的执行。我们使用两种广泛使用的图像处理操作的IWPP实现方式,在最先进的GPU加速机(配备3个GPU和2个多核CPU)上评估我们的方法:形态重建和欧氏距离变换。我们的结果表明GPU的性能有了显着提高。相对于用于形态重构和欧几里德距离变换的单核CPU执行,使用多个CPU和GPU协同获得的速度提高了50倍和85倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号