...
首页> 外文期刊>Parallel Computing >Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP
【24h】

Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP

机译:具有增强的SIMD引擎的高效多媒体协处理器,可利用ILP和DLP

获取原文
获取原文并翻译 | 示例
           

摘要

Multimedia applications have become increasingly important in daily computing. These applications are composed of heterogeneous regions of code mixed with data-level parallelism (DLP) and instruction-level parallelism (ILP). A standard solution for a multimedia coprocessor resembles of single-instruction multiple-data (SIMD) engines into architectures exploiting ILP at compile time, such as very long instruction word (VLIW) and transport triggered architecture (HA). However, the ILP regions fail to scale with the increased vector length to achieve high performance in the DLP regions. Furthermore, the register-to-register nature of SIMD instructions causes current SIMD engines to have limitations in handling memory alignment, data reorganization, and control flow. Many supporting instructions such as data permutations, address generations, and loop branches, are required to aid in the execution of the real SIMD computation instructions. To mitigate these problems, we propose optimized SIMD engines that have the capabilities for combining VLIW or TTA processing with a unified scalar and long vector computations as well as efficient SIMD hardware for real computation. Our new architecture is based on TTA and is called multimedia coprocessor (MCP). This architecture includes following features: (1) a simple coprocessor structure with 8-way TTA, (2) cost-effective SIMD hardware capable of performing floating-point operations, (3) long vector capabilities built upon existing SIMD hardware and a single register file and processor data path for both scalar operands and vector elements, and (4) an optimized SIMD architecture that addresses the SIMD limitations. Our experimental evaluations show that MCP can outperform conventional SIMD techniques by an average of 39% and 12% in performance for multimedia kernels and applications, respectively.
机译:多媒体应用在日常计算中变得越来越重要。这些应用程序由混合了数据级并行性(DLP)和指令级并行性(ILP)的异构代码区域组成。多媒体协处理器的标准解决方案类似于单指令多数据(SIMD)引擎,成为在编译时利用ILP的体系结构,例如超长指令字(VLIW)和传输触发体系结构(HA)。但是,ILP区域无法随矢量长度的增加而缩放,从而无法在DLP区域中实现高性能。此外,SIMD指令的寄存器到寄存器性质使当前的SIMD引擎在处理内存对齐,数据重组和控制流方面具有局限性。需要许多支持指令,例如数据排列,地址生成和循环分支,以帮助执行实际的SIMD计算指令。为了减轻这些问题,我们提出了优化的SIMD引擎,该引擎具有将VLIW或TTA处理与统一的标量和长向量计算相结合的能力,以及用于实际计算的高效SIMD硬件。我们的新架构基于TTA,被称为多媒体协处理器(MCP)。该体系结构包括以下功能:(1)具有8路TTA的简单协处理器结构;(2)具有执行浮点运算能力的经济高效的SIMD硬件;(3)基于现有SIMD硬件和单个寄存器构建的长向量功能标量操作数和向量元素的文件和处理器数据路径,以及(4)解决SIMD限制的优化SIMD体系结构。我们的实验评估表明,在多媒体内核和应用程序方面,MCP的性能分别比传统SIMD技术高39%和12%。

著录项

  • 来源
    《Parallel Computing》 |2013年第10期|586-602|共17页
  • 作者单位

    School of Computer, National University of Defense Technology, Changsha 410073, China;

    School of Computer, National University of Defense Technology, Changsha 410073, China;

    School of Computer, National University of Defense Technology, Changsha 410073, China;

    School of Computer, National University of Defense Technology, Changsha 410073, China;

    School of Computer, National University of Defense Technology, Changsha 410073, China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Scalar; Vector; SIMD; VLIW; TTA; Multimedia coprocessor;

    机译:学者;向量;信德省VLIW;Tta;多媒体协处理器;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号