首页> 外文期刊>Very Large Scale Integration (VLSI) Systems, IEEE Transactions on >Memory-Hierarchical and Mode-Adaptive HEVC Intra Prediction Architecture for Quad Full HD Video Decoding
【24h】

Memory-Hierarchical and Mode-Adaptive HEVC Intra Prediction Architecture for Quad Full HD Video Decoding

机译:四全高清视频解码的存储器分层和模式自适应HEVC帧内预测架构

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a high-throughput and area-efficient VLSI architecture for intra prediction in the emerging high efficiency video coding standard. Three design techniques are proposed to address the complexity systematically: 1) a hierarchical memory deployment that stores neighboring samples in 4.9 Kb of static RAM (SRAM) instead of 43.2-k gates of registers and increases throughput by processing reference samples in registers; 2) a mode-adaptive scheduling scheme for all prediction units, which provides at least 2 samples/cycle throughput while using low-throughput SRAM and can achieve 2.46 samples/cycle on the average based on the experimental results; and 3) resource sharing for multipliers and the readout circuits of reference sample registers, which can save 2.5-k gates. These techniques can efficiently reduce area by 40% but induce more power because of additional signal transitions. Signal-gating circuits are then applied to reduce 69% of SRAM power and 32% of logic power, which cost only 1.0-k gates. When synthesized at 200 MHz with 40-nm process, the proposed architecture needs only 27.0-k gates and 4.9 Kb of single-port SRAM. The layout core area is 0.036 ${rm mm}^{2}$ , and the power consumption is 2.11 mW in the postlayout simulation. The corresponding performance can support quad full high-definition (HD) (3840 $,times,$ 2160) video decoding at 30 frames/s.
机译:本文提出了一种在新兴的高效视频编码标准中用于帧内预测的高吞吐量和面积高效的VLSI架构。提出了三种设计技术来系统地解决复杂性:1)分层存储器部署,将相邻样本存储在4.9 Kb的静态RAM(SRAM)中,而不是寄存器的43.2-k门,并通过处理寄存器中的参考样本来提高吞吐量。 2)所有预测单元的模式自适应调度方案,在使用低吞吐量SRAM的同时提供至少2个样本/周期的吞吐量,并且根据实验结果平均可以实现2.46个样本/周期; 3)乘法器和参考样本寄存器的读出电路的资源共享,可以节省2.5 k的门。这些技术可以有效地将面积减少40%,但由于额外的信号转换,会产生更多的功率。然后应用信号门控电路来降低69%的SRAM功耗和32%的逻辑功耗,而这仅需花费1.0k的栅极。当以40 nm工艺在200 MHz频率下合成时,所提出的架构仅需要27.0-k门和4.9 Kb单端口SRAM。在布局后仿真中,布局核心面积为0.036 $ {rm mm} ^ {2} $,功耗为2.11 mW。相应的性能可以以30帧/秒的速度支持四路全高清(HD)(3840 x 2160美元)视频解码。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号