首页> 美国卫生研究院文献>other >Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards

【2h】

Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards

机译：计算的统一设备架构实现块匹配算法的多个图形处理单元卡

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we describe and evaluate a fast implementation of a classical block matching motion estimation algorithm for multiple Graphical Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) computing engine. The implemented block matching algorithm (BMA) uses summed absolute difference (SAD) error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and non-integer search grids.The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a non-integer search grid. The additional speedup for non-integer search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable.In addition we compared execution time of the proposed FS GPU implementation with two existing, highly optimized non-full grid search CPU based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and Simplified Unsymmetrical multi-Hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation.We also demonstrated that for an image sequence of 720×480 pixels in resolution, commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards.

著录项

期刊名称 other
作者
Francesc Massanes; Marie Cadennes; Jovan G. Brankov;
展开▼
作者单位

展开▼
年(卷),期 -1(20),3
年度 -1
页码 033004
总页数 24
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards [J] . Francesc Massanes Marie Cadennes and Jovan G. Brankov Journal of Electronic Imaging . 2011,第3期

机译：多个图形处理单元卡的块匹配算法的计算统一设备架构实现
2. Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards [J] . Francesc Massanes, Marie Cadennes, Jovan G. Brankov Journal of electronic imaging . 2011,第3期

机译：多个图形处理单元卡的块匹配算法的计算统一设备架构实现
3. Hardware Implementation of Instruction Level Parallel Architecture Incorporating Special Functional Units for Image Processing Algorithms [J] . M. Kannan, S.K. Srivatsa Information Technology Journal . 2006,第3期

机译：包含特殊功能单元的图像处理算法的指令级并行体系结构的硬件实现
4. Implementation of Three SIMD Algorithms for Graphical User Interface Processing in Mobile Devices Using the Atsana J2210 Media Processor [C] . Kristopher C. Breen, Jesus Hernandez Tapia, Duncan G. Elliott Canadian Conference on Electrical and Computer Engineering . 2005

机译：使用ATSANA J2210媒体处理器实现移动设备中的三种SIMD算法的图形用户界面处理
5. Analysis and implementation of Room Assignment problem and Cannon's algorithm on general purpose programmable graphical processing units with CUDA. [D] . Dwivedi, Harsh Vardhan. 2011

机译：在具有CUDA的通用可编程图形处理单元上分析和实施房间分配问题和Cannon算法。
6. Graphics Processing Unit (GPU) implementation of image processing algorithms to improve system performance of the Control Acquisition Processing and Image Display System (CAPIDS) of the Micro-Angiographic Fluoroscope (MAF) [O] . S.N. Swetadri Vasan, Ciprian N. Ionita, A.H. Titus, -1

机译：图形处理单元（GpU）执行的图像处理算法以改善控制采集处理的系统的性能以及微造影荧光镜的图像显示系统（CapIDs）（maF）
7. Implementing Algorithms for Signal and Image Reconstruction on Graphical Processing Units [O] . Sangkyun Lee, Stephen J. Wright 2012

机译：图形处理单元上信号和图像重建的实现算法
8. Designing and Implementing an OVERFLOW Reader for ParaView and Comparing Performance Between Central Processing Units and Graphical Processing Units [R] . 2010

机译：为paraView设计和实现OVERFLOW读取器并比较中央处理单元和图形处理单元之间的性能

Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards

摘要

著录项

相似文献

相关主题

期刊订阅