首页> 外文期刊>Journal of VLSI signal processing systems >Algorithm and Software Optimization of Variable Block Size Motion Estimation for H.264/AVC on a VLIW-SIMD DSP
【24h】

Algorithm and Software Optimization of Variable Block Size Motion Estimation for H.264/AVC on a VLIW-SIMD DSP

机译:VLIW-SIMD DSP上H.264 / AVC可变块大小运动估计的算法和软件优化

获取原文
获取原文并翻译 | 示例

摘要

We implemented the H.264/AVC variable block size motion estimation (VBSME) using a very long instruction word (VLIW)-single instruction multiple data (SIMD) digital signal processor (DSP). The SADReuse method which has a regular structure is chosen for VBSME not only to remove redundant sum of absolute difference (SAD) operations but also to utilize the instruction level parallelism (ILP) and data level parallelism (DLP) of the architecture. A fast mode decision algorithm is developed to reduce the number of 'compare and update' operations and simplify the rate distortion optimization (RDO). The developed fast mode decision uses the difference of motion vectors and the maximum a posteriori (MAP) estimation of the rate-distortion costs. Several advanced software techniques that include software pipelining and packed-data processing are employed. Especially, memory access overhead reduction schemes including the multi-block processing and the inter-procedural scheduling are used for the software optimization. In order to reduce the 'write buffer full' in the quarter pixel ME, a 4 bit quantization scheme is developed, which increases the number of arithmetic operations but decreases the stall cycles very much. The implemented variable block size ME for H.264/ AVC requires an average of 9 M and 78 Mcycles per frame for QCIF and CIF size video sequences, respectively, in the TMS320C64x DSP architecture.
机译:我们使用超长指令字(VLIW)-单指令多数据(SIMD)数字信号处理器(DSP)实现了H.264 / AVC可变块大小运动估计(VBSME)。为VBSME选择具有规则结构的SADReuse方法,不仅可以消除冗余的绝对差之和(SAD)操作,而且可以利用体系结构的指令级并行度(ILP)和数据级并行度(DLP)。开发了一种快速模式决策算法,以减少“比较和更新”操作的数量并简化速率失真优化(RDO)。开发的快速模式决策使用运动矢量的差和速率失真成本的最大后验(MAP)估计。采用了包括软件流水线和打包数据处理在内的几种高级软件技术。特别地,包括多块处理和过程间调度的存储器访问开销减少方案被用于软件优化。为了减少四分之一像素ME中的“写缓冲区已满”,开发了一种4位量化方案,该方案增加了算术运算的数量,但是却大大减少了停顿周期。在TMS320C64x DSP架构中,针对H.264 / AVC实现的可变块大小ME,对于QCIF和CIF大小的视频序列,每帧平均分别需要9 M和78 Mcycles。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号