首页> 外文会议>The 2011 Conference on Design amp; Architectures for Signal and Image Processing >An efficient parallel motion estimation algorithm and X264 parallelization in CUDA
【24h】

An efficient parallel motion estimation algorithm and X264 parallelization in CUDA

机译:CUDA中的高效并行运动估计算法和X264并行化

获取原文
获取原文并翻译 | 示例

摘要

H.264/AVC video encoders have been widely used for its high coding efficiency. Since the computational demand proportional to the frame resolution is constantly increasing, it has been of great interest to accelerate H.264/AVC by parallel processing. Recently, graphics processing units (GPUs) have emerged as a viable target for accelerating general purpose applications by exploiting fine-grain data parallelisms. Despite extensive research effort to use GPUs to accelerate the H.264/AVC algorithm, it has not been successful to achieve any speed-up over the x264 algorithm that is known as the fastest CPU implementation because of significant communication overhead between the host CPU and the GPU and intra-frame dependency in the algorithm. In this paper, we propose a novel motion estimation (ME) algorithm tailored for NVIDIA GPU implementation. It is accompanied by a novel pipelining technique, called sub-frame ME processing, to effectively hide the communication overhead between the host CPU and the GPU. The proposed H.264 encoder achieves more than 20% speed-up compared with x264.
机译:H.264 / AVC视频编码器因其高编码效率而被广泛使用。由于与帧分辨率成比例的计算需求不断增加,因此通过并行处理来加速H.264 / AVC引起了极大的兴趣。最近,图形处理单元(GPU)已成为通过利用细粒度数据并行性来加速通用应用程序的可行目标。尽管进行了大量研究工作来使用GPU来加速H.264 / AVC算法,但由于主机CPU与主机之间的大量通信开销,因此无法成功地实现被称为最快CPU实现的x264算法的任何加速。算法中的GPU和帧内相关性。在本文中,我们提出了一种针对NVIDIA GPU实现量身定制的新颖运动估计(ME)算法。它伴随着一种称为子帧ME处理的新颖流水线技术,可有效隐藏主机CPU和GPU之间的通信开销。与x264相比,建议的H.264编码器可实现20%以上的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号