首页> 外文期刊>IEICE Transactions on fundamentals of electronics, communications & computer sciences >A 48 Cycles/MB H.264/AVC Deblocking Filter Architecture for Ultra High Definition Applications
【24h】

A 48 Cycles/MB H.264/AVC Deblocking Filter Architecture for Ultra High Definition Applications

机译:超高清应用的48周期/ MB H.264 / AVC解块滤波器架构

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

In this paper, a highly parallel deblocking filter architecture for H.264/AVC is proposed to process one macroblock in 48 clock cycles and give real-time support to QFHD@60 fps sequences at less than 100 MHz. 4 edge filters organized in 2 groups for simultaneously processing vertical and horizontal edges are applied in this architecture to enhance its throughput. While parallelism increases, pipeline hazards arise owing to the latency of edge filters and data dependency of deblocking algorithm. To solve this problem, a zig-zag processing schedule is proposed to eliminate the pipeline bubbles. Data path of the architecture is then derived according to the processing schedule and optimized through data flow merging, so as to minimize the cost of logic and internal buffer. Meanwhile, the architecture's data input rate is designed to be identical to its throughput, while the transmission order of input data can also match the zig-zag processing schedule. Therefore no intercommunication buffer is required between the deblocking filter and its previous component for speed matching or data reordering. As a result, only one 24×64 two-port SRAM as internal buffer is required in this design. When synthesized with SMIC 130nm process, the architecture costs a gate count of 30.2 k, which is competitive considering its high performance.
机译:本文提出了一种用于H.264 / AVC的高度并行解块滤波器架构,可在48个时钟周期内处理一个宏块,并在不到100 MHz的频率下为QFHD @ 60 fps序列提供实时支持。在此体系结构中应用了4个边缘过滤器(分为2组,用于同时处理垂直边缘和水平边缘),以提高其吞吐量。尽管并行性增加,但由于边缘过滤器的延迟和解块算法的数据依赖性,会带来流水线危害。为了解决这个问题,提出了一种锯齿形的加工时间表以消除管道气泡。然后根据处理时间表导出体系结构的数据路径,并通过数据流合并来优化体系结构的数据路径,以最大程度地减少逻辑和内部缓冲区的成本。同时,该体系结构的数据输入速率被设计为与其吞吐量相同,而输入数据的传输顺序也可以匹配之字形处理时间表。因此,在解块滤波器与其先前的组件之间不需要内部通信缓冲区来进行速度匹配或数据重新排序。结果,该设计仅需要一个24×64两端口SRAM作为内部缓冲器。当采用SMIC 130nm工艺进行合成时,该架构的门数为30.2 k,考虑到其高性能,这具有竞争力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号