首页> 外文期刊>Parallel Algorithms and Applications >Accelerating computation of Euclidean distance map using the GPU with efficient memory access
【24h】

Accelerating computation of Euclidean distance map using the GPU with efficient memory access

机译:使用具有高效内存访问功能的GPU加速欧几里得距离图的计算

获取原文
获取原文并翻译 | 示例
           

摘要

Recent graphics processing units (GPUs), which have many processing units, can be used for general purpose parallel computation. To utilise the powerful computing ability, GPUs are widely used for general purpose processing. Since GPUs have very high memory bandwidth, the performance of GPUs greatly depends on memory access. The main contribution of this paper is to present a GPU implementation of computing Euclidean distance map (EDM) with efficient memory access. Given a two-dimensional (2D) binary image, EDM is a 2D array of the same size such that each element stores the Euclidean distance to the nearest black pixel. In the proposed GPU implementation, we have considered many programming issues of the GPU system such as coalesced access of global memory and shared memory bank conflicts, and so on. To be concrete, by transposing 2D arrays, which are temporal data stored in the global memory, with the shared memory, the main access from/to the global memory enables to be performed by coalesced access. In practice, we have implemented our parallel algorithm in the following three modern GPU systems: Tesla C1060, GTX 480 and GTX 580. The experimental results have shown that, for an input binary image with size of 9216 X 9216, our implementation can achieve a speedup factor of 54 over the sequential algorithm implementation.
机译:具有许多处理单元的最新图形处理单元(GPU)可用于通用并行计算。为了利用强大的计算能力,GPU被广泛用于通用处理。由于GPU具有很高的内存带宽,因此GPU的性能在很大程度上取决于内存访问。本文的主要贡献是提出了一种使用高效内存访问来计算欧几里得距离图(EDM)的GPU实现。给定二维(2D)二值图像,EDM是相同大小的2D数组,因此每个元素都存储到最近的黑色像素的欧几里德距离。在提出的GPU实现中,我们考虑了GPU系统的许多编程问题,例如全局内存的合并访问和共享内存库冲突,等等。具体而言,通过将作为存储在全局存储器中的时间数据的2D阵列与共享存储器进行转置,可以通过合并访问来进行对全局存储器的主访问。实际上,我们在以下三个现代GPU系统中实现了并行算法:Tesla C1060,GTX 480和GTX580。实验结果表明,对于尺寸为9216 X 9216的输入二进制图像,我们的实现可以实现顺序算法实现的加速因子为54。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号