首页> 外文期刊>Journal of VLSI signal processing >Optimization of the Adaptive Computationally-Scalable Motion Estimation and Compensation for the Hardware H.264/AVC Encoder
【24h】

Optimization of the Adaptive Computationally-Scalable Motion Estimation and Compensation for the Hardware H.264/AVC Encoder

机译:硬件H.264 / AVC编码器的自适应计算可缩放运动估计和补偿的优化

获取原文
获取原文并翻译 | 示例
           

摘要

The adaptive computationally-scalable motion estimation algorithm and its hardware implementation allow the H.264/AVC encoder to achieve efficiencies close to optimal in real-time conditions. Particularly, the search algorithm achieves results close to optimum even if the number of search points assigned to macroblocks is strongly limited and varies with time. The architecture implementing the algorithm developed and reported previously takes at least 674 clock cycles to interpolate and load reference area, and the number cannot be decreased without decreasing the search range. This paper proposes some optimizations of the architecture to increase the maximal throughput achieved by the motion estimation system even four times. Firstly, the chroma interpolation follows the search process, whereas the luma interpolation precedes it. Secondly, the luma interpolator computes 128 instead of 64 samples per each clock cycle. Thirdly, the number of on-chip memories keeping interpolated reference area is increased accordingly to 128. Fourthly, some modules previously working at the base frequency are redesigned to operate at the doubled clock. Since the on-chip memories do not store fractional-pel chroma samples, their joint size is reduced from 160.44 to 104.44 kB. Additional savings in the memory size are achieved by the sequential processing of two reference-picture areas for each macroblock. The architecture is verified in the real-time FPGA hardware encoder. Synthesis results show that the updated architecture can support 2160p@30fps encoding for 0.13 mu m TSMC technology with a small increase in hardware resources and some losses in the compression efficiency. The efficiency is improved when processing smaller resolutions.
机译:自适应计算可缩放运动估计算法及其硬件实现使H.264 / AVC编码器在实时条件下可获得接近最佳的效率。特别地,即使强烈限制分配给宏块的搜索点的数量并且随时间变化,该搜索算法也能获得接近最佳结果的结果。实现先前开发和报告的算法的体系结构至少需要674个时钟周期来内插和加载参考区域,并且在不减小搜索范围的情况下不能减少数量。本文提出了一些体系结构优化,以将运动估计系统获得的最大吞吐量提高四倍。首先,色度插值在搜索过程之后,而亮度插值在搜索过程之前。其次,亮度内插器在每个时钟周期计算128个样本,而不是64个样本。第三,保持插值参考区域的片上存储器的数量相应地增加到128。第四,一些以前以基本频率工作的模块被重新设计为以双倍时钟运行。由于片上存储器不存储小数像素色度样本,因此它们的联合大小从160.44 kB减小到104.44 kB。通过对每个宏块的两个参考图像区域进行顺序处理,可以进一步节省存储空间。该架构已在实时FPGA硬件编码器中得到验证。综合结果表明,更新后的体系结构可以支持0.13微米TSMC技术的2160p @ 30fps编码,而硬件资源的增加很少,压缩效率也有所损失。处理较小的分辨率时,效率得到提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号