首页> 外文期刊>IEEE Transactions on Image Processing >CU Partition Mode Decision for HEVC Hardwired Intra Encoder Using Convolution Neural Network
【24h】

CU Partition Mode Decision for HEVC Hardwired Intra Encoder Using Convolution Neural Network

机译:基于卷积神经网络的HEVC有线内部编码器的CU分区模式决策

获取原文
获取原文并翻译 | 示例

摘要

The intensive computation of High Efficiency Video Coding (HEVC) engenders challenges for the hardwired encoder in terms of the hardware overhead and the power dissipation. On the other hand, the constrains in hardwired encoder design seriously degrade the efficiency of software oriented fast coding unit (CU) partition mode decision algorithms. A fast algorithm is attributed as VLSI friendly, when it possesses the following properties. First, the maximum complexity of encoding a coding tree unit (CTU) could be reduced. Second, the parallelism of the hardwired encoder should not be deteriorated. Third, the process engine of the fast algorithm must be of low hardware- and power-overhead. In this paper, we devise the convolution neural network based fast algorithm to decrease no less than two CU partition modes in each CTU for full rate-distortion optimization (RDO) processing, thereby reducing the encoder's hardware complexity. As our algorithm does not depend on the correlations among CU depths or spatially nearby CUs, it is friendly to the parallel processing and does not deteriorate the rhythm of RDO pipelining. Experiments illustrated that, an averaged 61.1% intraencoding time was saved, whereas the Bjøntegaard-Delta bit-rate augment is 2.67%. Capitalizing on the optimal arithmetic representation, we developed the high-speed [714 MHz in the worst conditions (125 °C, 0.9 V)] and low-cost (42.5k gate) accelerator for our fast algorithm by using TSMC 65-nm CMOS technology. One accelerator could support HD1080p at 55 frames/s real-time encoding. The corresponding power dissipation was 16.2 mW at 714 MHz. Finally, our accelerator is provided with good scalability. Four accelerators fulfill the throughput requirements of UltraHD-4K at 55 frames/s.
机译:高效视频编码(HEVC)的密集计算给硬编码器带来了硬件开销和功耗方面的挑战。另一方面,硬编码器设计中的约束严重降低了面向软件的快速编码单元(CU)分区模式决策算法的效率。快速算法具有以下属性时,它被认为是VLSI友好的。首先,可以减少编码编码树单元(CTU)的最大复杂度。其次,硬编码器的并行性不应恶化。第三,快速算法的处理引擎必须具有较低的硬件和功耗开销。在本文中,我们设计了一种基于卷积神经网络的快速算法,以在每个CTU中减少不少于两个CU分区模式,以进行全速率失真优化(RDO)处理,从而降低编码器的硬件复杂性。由于我们的算法不依赖于CU深度或空间上邻近的CU之间的相关性,因此它对并行处理很友好,并且不会降低RDO流水线的节奏。实验表明,平均节省了61.1%的帧内编码时间,而Bjøntegaard-Delta比特率提高了2.67%。利用最佳算术表示法,我们使用台积电65-nm CMOS为我们的快速算法开发了高速[714 MHz在最差条件(125°C,0.9 V)下]和低成本(42.5k门)加速器技术。一种加速器可以以55帧/秒的实时编码支持HD1080p。 714 MHz时相应的功耗为16.2 mW。最后,我们的加速器具有良好的可扩展性。四个加速器以55帧/秒的速度满足UltraHD-4K的吞吐量要求。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号