首页> 外文期刊>IEICE transactions on information and systems >Parallelism Analysis of H.264 Decoder and Realization on a Coarse-Grained Reconfigurable SoC
【24h】

Parallelism Analysis of H.264 Decoder and Realization on a Coarse-Grained Reconfigurable SoC

机译:H.264解码器的并行性分析和粗粒度可重构SoC的实现

获取原文
       

摘要

One of the largest challenges for coarse-grained reconfigurable arrays (CGRAs) is how to efficiently map applications. The key issues for mapping are (1) how to reduce the memory bandwidth, (2) how to exploit parallelism in algorithms and (3) how to achieve load balancing and take full advantage of the hardware potential. In this paper, we propose a novel parallelism scheme, called ‘Hybrid partitioning’, for mapping a H.264 high definition (HD) decoder onto REMUS-II, a CGRA system-on-chip (SoC). Combining good features of data partitioning and task partitioning, our methodology mainly consists of three levels from top to bottom: (1) hybrid task pipeline based on slice and macroblock (MB) level; (2) MB row-level data parallelism; (3) sub-MB level parallelism method. Further, on the sub-MB level, we propose a few mapping strategies such as hybrid variable block size motion compensation (Hybrid VBSMC) for MC, 2D-wave for intra 4×4, parallel processing order for deblocking. With our mapping strategies, we improved the algorithm's performance on REMUS-II. For example, with a luma 16×16MB, the Hybrid VBSMC achieves 4 times greater performance than VBSMC and 2.2 times greater performance than fixed 4×4 partition approach. Finally, we achieve 1080p@33fps H.264 high-profile (HiP)@level 4.1 decoding when the working frequency of REMUS-II is 200MHz. Compared with typical hardware platforms, we can achieve better performance, area, and flexibility. For example, our performance achieves approximately 175% improvement than that of a commercial CGRA processor XPP-III while only using 70% of its area.
机译:粗粒度可重配置阵列(CGRA)的最大挑战之一是如何有效地映射应用程序。映射的关键问题是(1)如何减少内存带宽,(2)如何利用算法中的并行性,以及(3)如何实现负载平衡并充分利用硬件潜力。在本文中,我们提出了一种新颖的并行机制,称为“混合分区”,用于将H.264高清晰度(HD)解码器映射到CGRA片上系统(SoC)REMUS-II。结合数据分区和任务分区的良好功能,我们的方法主要由上至下三个层次组成:(1)基于切片和宏块(MB)层次的混合任务流水线; (2)MB行级数据并行性; (3)子MB级并行方法。此外,在亚MB级别上,我们提出了一些映射策略,例如用于MC的混合可变块大小运动补偿(Hybrid VBSMC),用于内部4×4的2D波,用于去块的并行处理顺序。通过我们的映射策略,我们提高了该算法在REMUS-II上的性能。例如,在亮度为16×16MB的情况下,混合VBSMC的性能是VBSMC的4倍,是固定4×4分区方法的2.2倍。最后,当REMUS-II的工作频率为200MHz时,我们实现1080p @ 33fps H.264高配置(HiP)@ 4.1级解码。与典型的硬件平台相比,我们可以获得更好的性能,面积和灵活性。例如,我们的性能比商用CGRA处理器XPP-III大约提高了175%,而仅使用其70%的面积。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号