首页> 外文期刊>Very Large Scale Integration (VLSI) Systems, IEEE Transactions on >A Memory-Efficient and Highly Parallel Architecture for Variable Block Size Integer Motion Estimation in H.264/AVC
【24h】

A Memory-Efficient and Highly Parallel Architecture for Variable Block Size Integer Motion Estimation in H.264/AVC

机译:H.264 / AVC中可变块大小整数运动估计的内存高效且高度并行的体系结构

获取原文
获取原文并翻译 | 示例

摘要

Variable block size motion estimation (VBSME) is one of several contributors to H.264/AVC's excellent coding efficiency. However, its high computational complexity and huge memory traffic make deign difficult. In this paper, we propose a memory-efficient and highly parallel VLSI architecture for full search VBSME (FSVBSME). Our architecture consists of 16 2-D arrays each consists of 16$,times,$ 16 processing elements (PEs). Four arrays form a group to match in parallel four reference blocks against one current block. Four groups perform block matching for four current blocks in a pipelined fashion. Taking advantage of overlapping among multiple reference blocks of a current block and between search windows of adjacent current blocks, we propose a novel data reuse scheme to reduce memory access. Compared with the popular Level C data reuse scheme, our approach can save 98% of on-chip memory access with only 25% of local memory overhead. Synthesized into a TSMC 180-nm CMOS cell library, our design is capable of processing 1920$,times,$ 1088 30 fps video when running at 130 MHz. The architecture is scalable for wider search range, multiple reference frames and pixel truncation as well as down sampling. We suggest a criterion called design efficiency for comparing different works. It shows that the proposed design is 72% more efficient than the best design to date.
机译:可变块大小运动估计(VBSME)是H.264 / AVC出色的编码效率的众多贡献者之一。但是,其高计算复杂性和巨大的内存流量使设计变得困难。在本文中,我们为全搜索VBSME(FSVBSME)提出了一种内存有效且高度并行的VLSI架构。我们的体系结构由16个2-D数组组成,每个数组由16 $,x,$ 16个处理元素(PE)组成。四个阵列组成一个组,以将四个参考块与一个当前块并行匹配。四个组以流水线方式对四个当前块执行块匹配。利用当前块的多个参考块之间以及相邻当前块的搜索窗口之间的重叠优势,我们提出了一种新颖的数据重用方案以减少内存访问。与流行的C级数据重用方案相比,我们的方法可以节省98%的片上内存访问,而仅节省25%的本地内存开销。综合到台积电180纳米CMOS单元库中,我们的设计能够在130 MHz下运行时处理1920 $,1088美元的30 fps视频。该架构可扩展,以实现更宽的搜索范围,多个参考帧和像素截断以及下采样。我们建议使用一种称为设计效率的标准来比较不同的作品。结果表明,提出的设计比迄今为止的最佳设计效率高72%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号