首页> 外文OA文献 >Improved SIMD architecture for high performance video processors
【2h】

Improved SIMD architecture for high performance video processors

机译:针对高性能视频处理器的改进的SIMD架构

摘要

Single instruction multiple data (SIMD) execution is in no doubt an efficient way to exploit the data level parallelism in image and video applications. However, SIMD execution bottlenecks must be tackled in order to achieve high execution efficiency. We first analyze in this paper the implementation of two major kernel functions of H.264/AVC namely, SATD and subpel interpolation, in conventional SIMD architectures to identify the bottlenecks in traditional approaches. Based on the analysis results, we propose a new SIMD architecture with two novel features: 1) parallel memory structure with variable block size and word length support, and 2) configurable SIMD structure. The proposed parallel memory structure allows great flexibility for programmers to perform data access of different block sizes and different word lengths. The configurable SIMD structure allows almost random register file access and slightly different operations in ALUs inside SIMD. The new features greatly benefit the realization of H.264/AVC kernel functions. For instance, the fractional motion estimation, particularly the half to quarter pixel interpolation, can now be executed with minimal or no additional memory access. When comparing with the conventional SIMD systems, the proposed SIMD architecture can have a further speedup of 2.1X to 4.6X when implementing H.264/AVC kernel functions. Based on Amdahl's law, the overall speedup of H.264/AVC encoding application can be projected to be 2.46X. We expect significant improvement can also be achieved when applying the proposed architecture to other image and video processing applications.
机译:毫无疑问,单指令多数据(SIMD)执行是在图像和视频应用程序中利用数据级并行性的有效方法。但是,必须解决SIMD执行瓶颈以实现高执行效率。在本文中,我们首先分析了传统SIMD架构中H.264 / AVC的两个主要内核功能(SATD和subpel插值)的实现,以识别传统方法中的瓶颈。根据分析结果,我们提出了一种具有两个新功能的新SIMD架构:1)具有可变块大小和字长支持的并行存储器结构,以及2)可配置SIMD结构。所提出的并行存储器结构为程序员提供了极大的灵活性,可以执行不同块大小和不同字长的数据访问。可配置的SIMD结构允许几乎随机的寄存器文件访问以及SIMD内部ALU中的操作略有不同。新功能大大有利于H.264 / AVC内核功能的实现。例如,分数运动估计,特别是半像素到四分之一像素插值,现在可以用最少或没有额外的存储器访问来执行。与常规SIMD系统进行比较时,在实现H.264 / AVC内核功能时,建议的SIMD体系结构可以进一步提高2.1倍到4.6倍的速度。根据阿姆达尔定律,H.264 / AVC编码应用程序的总体速度可以预计为2.46倍。我们希望将建议的体系结构应用于其他图像和视频处理应用程序时,也可以实现显着改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号