首页> 外文会议>International Conference on Computer Design >Evaluating signal processing and multimedia applications on SIMD, VLIW and superscalar architectures
【24h】

Evaluating signal processing and multimedia applications on SIMD, VLIW and superscalar architectures

机译:评估SIMD,VLIW和Superscalar架构上的信号处理和多媒体应用

获取原文

摘要

This paper aims to provide a quantitative understanding of the performance of DSP and multimedia applications on very long instruction word (VLIW), single instruction multiple data (SIMD), and superscalar processors. We evaluate the performance of the VLIW paradigm using Texas Instruments Inc.'s TMS320C62xx processor and the SIMD paradigm using Intel's Pentium II processor (with MMX) on a set of DSP and media benchmarks. Tradeoffs in superscalar performance are evaluated with a combination of measurements on Pentium II and simulation experiments on the SimpleScalar simulator. Our benchmark suite includes kernels (filtering, autocorrelation, and dot product) and applications (audio effects, G.711 speech coding, and speech compression). Optimized assembly libraries and compiler intrinsics were used to create the SIMD and VLIW code. We used the hardware performance counters on the Pentium II and the stand-alone simulator for the C62xx to obtain the execution cycle counts. In comparison to non-SIMD Pentium II performance, the SIMD version exhibits a speedup ranging from 1.0 to 5.5 while the speedup of the VLIW version ranges from 0.63 to 9.0. The benchmarks are seen to contain large amounts of available parallelism, however, most of it is inter-iteration parallelism. Out-of-order execution and branch prediction are observed to be extremely important to exploit such parallelism in media applications.
机译:本文旨在为在非常长的指令字(VLIW),单指令多数据(SIMD)和超高级处理器上进行定量了解DSP和多媒体应用的性能。我们使用TOXAS Instruments Inc.的TMS320C62XX处理器和SIMD范例使用Intel的Pentium II处理器(带MMX)的SIMD范例进行了评估了VLIW范例的性能。在一组DSP和媒体基准测试中使用MMX。 Superscalar性能的权衡通过对奔腾II的测量和简单的模拟器的模拟实验进行了评估。我们的基准套件包括内核(过滤,自相关和点产品)和应用程序(音频效果,G.711语音编码和语音压缩)。优化的组装库和编译器内在函数用于创建SIMD和VLIW代码。我们在Pentium II上使用了硬件性能计数器以及用于C62xx的独立模拟器以获得执行周期计数。与非SIMD Pentium II性能相比,SIMD版本的加速度为1.0到5.5,而VLIW版的加速度范围为0.63至9.0。该基准被认为包含大量的可用行活性,然而,大多数是互移并行性。观察到超出订单执行和分支预测,以利用媒体应用中的这种并行性极为重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号