首页> 外文会议>International Conference on Image, Video Processing and Artificial Intelligence >Highly parallel GPU accelerator for HEVC transform and quantization
【24h】

Highly parallel GPU accelerator for HEVC transform and quantization

机译:高度平行的GPU加速器,用于HEVC变换和量化

获取原文

摘要

When analysing Internet traffic today it can be found that digital video content prevails. Its domination will continue to grow in the upcoming years and reach 82% of all traffic by 2021. If converted to Internet video minutes per second, this equals about one million video minutes per second. Providing and supporting improved compression capability is therefore expected from video processing devices. This will relieve the pressure on storage systems and communication networks while creating preconditions for further development of video services. Transform and quantization is one of the most compute-intensive parts of modern hybrid video coding systems where coding algorithm itself is commonly standardized. High Efficiency Video Coding (HEVC) is state-of-the-art video coding standard which achieves high compression efficiency at the cost of high computational complexity. In this paper we present highly parallel GPU accelerator for HEVC transform and quantization which targets most common heterogeneous computing CPU+GPU system. The accelerator is implemented using CUDA programming model. All the relevant state-of-the-art techniques related to kernel vectorization, shared memory optimization and overlapping data transfers with computation were investigated, customized and carefully combined to obtain a performance efficient solution across all applicable transform sizes. The proposed solution is compared against reference implementation which uses NVIDIA cuBLAS library to perform the same work. Obtained speedup factors for DCI 4K. frame are 2.46 times for largest transform size and 130.17 times for smallest transform size what revealed substantial performance gap of this library when targeting GPU of the Kepler architecture. Achieved processing time of frame transform and quantization are up to 4.82 ms.
机译:当今天分析互联网流量时,可以发现数字视频内容占上风。其统治将在即将到来的年度继续增长,并在2021年达到所有交通的82%。如果转换为每秒互联网视频分钟,这等于每秒100万视频分钟。因此,从视频处理设备预期提供和支持改进的压缩能力。这将减轻存储系统和通信网络的压力,同时创建用于进一步开发视频服务的前提。变换和量化是现代混合视频编码系统的最具计算密集型部分之一,其中编码算法本身通常是标准化的。高效视频编码(HEVC)是最先进的视频编码标准,其以高计算复杂性成本实现高压缩效率。在本文中,我们对HEVC变换和量化提供了高度平行的GPU加速器,其瞄准最常见的异构计算CPU + GPU系统。加速器使用CUDA编程模型实现。研究了与内核矢量化,共享内存优化和重叠数据传输相关的所有相关的技术,都是通过计算的,定制和仔细组合,以获得所有适用的变换大小的性能有效解决方案。将所提出的解决方案与参考实施方式进行比较,该参考实施方法使用NVIDIA Cublas文库进行相同的工作。获得DCI 4K的加速因子。帧为最大变换大小的2.46倍,最小变换大小为130.17倍,在瞄准电磁体架构的GPU时,该库的大量性能差距显示出什么。实现帧变换和量化的处理时间高达4.82毫秒。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号