...
首页> 外文期刊>Journal of electronic imaging >Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards
【24h】

Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards

机译:多个图形处理单元卡的块匹配算法的计算统一设备架构实现

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We describe and evaluate a fast implementation of a classical block-matching motion estimation algorithm for multiple graphical processing units (GPUs) using the compute unified device architecture computing engine. The implemented block-matching algorithm uses summed absolute difference error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation, we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and noninteger search grids. The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a noninteger search grid. The additional speedup for a noninteger search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable. In addition, we compared the execution time of the proposed FS GPU implementation with two existing, highly optimized nonfull grid search CPU-based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and simplified unsymmet-rical multi-hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation. We also demonstrated that for an image sequence of 720 x 480 pixels in resolution commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards.
机译:我们使用计算统一设备体系结构计算引擎来描述和评估针对多个图形处理单元(GPU)的经典块匹配运动估计算法的快速实现。所实现的块匹配算法使用求和的绝对差误差标准和全网格搜索(FS)来找到最佳块位移。在此评估中,我们使用整数和非整数搜索网格比较了各种大小的图像的GPU和CPU实现的执行时间。结果表明,使用GPU卡可使整数的计算时间缩短200倍,非整数搜索网格的计算时间缩短1000倍。非整数搜索网格的额外加速来自于GPU具有用于图像插值的内置硬件这一事实。此外,当使用多个GPU卡时,所提供的评估结果表明了跨多个卡进行数据拆分方法的重要性,但是使用多个卡几乎可以实现线性加速。此外,我们将拟议的FS GPU实现的执行时间与两种现有的,高度优化的基于非全网格搜索基于CPU的运动估计方法进行了比较,即在OpenCV中实现金字塔形Lucas Kanade光流算法和简化的非对称多六边形在H.264 / AVC标准中搜索。在这些比较中,即使FS GPU实施的计算复杂度明显高于非FS CPU实施,但FS GPU实施仍显示出适度的改进。我们还证明,对于视频监控中通常使用的分辨率为720 x 480像素的图像序列,建议的GPU实施对于使用两块NVIDIA C1060 Tesla GPU卡以每秒30帧的速度进行实时运动估计而言,足够快。

著录项

  • 来源
    《Journal of electronic imaging》 |2011年第3期|p.033004.1-033004.10|共10页
  • 作者单位

    Illinois Institute of Technology Medical Imaging Research Center Chicago, Illinois 60616;

    Illinois Institute of Technology Medical Imaging Research Center Chicago, Illinois 60616;

    Illinois Institute of Technology Medical Imaging Research Center Chicago, Illinois 60616;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号