首页> 外文期刊>IEEE Transactions on Computers >Bottlenecks in multimedia processing with SIMD style extensions and architectural enhancements
【24h】

Bottlenecks in multimedia processing with SIMD style extensions and architectural enhancements

机译:SIMD样式扩展和体系结构增强带来的多媒体处理瓶颈

获取原文
获取原文并翻译 | 示例
           

摘要

Multimedia SIMD extensions such as MMX and AltiVec speed up media processing; however, our characterization shows that the attributes of current general-purpose processors enhanced with SIMD extensions do not match very well with the access patterns and loop structures of media programs. We find that 75 to 85 percent of the dynamic instructions in the processor instruction stream are supporting instructions necessary to feed the SIMD execution units rather than true/useful computations, resulting in the underutilization of SIMD execution units (only 1 to 12 percent of the peak SIMD execution units' throughput is achieved). Contrary to focusing on exploiting more data-level parallelism (DLP), we focus on the instructions that support the SIMD computations and exploit both fine and coarse-grained instruction level parallelism (ILP) in the supporting instruction stream. We propose the MediaBreeze architecture that uses hardware support for efficient address generation, looping, and data reorganization (permute, packing/unpacking, transpose, etc.). Our results on multimedia kernels show that a 2-way processor with SIMD extensions enhanced with MediaBreeze provides a better performance than a 16-way processor with current SIMD extensions. In the case of application benchmarks, a 2-/4-way processor with SIMD extensions augmented with MediaBreeze outperforms a 4-/8-way processor with SIMD extensions. A first-order approximation using ASIC synthesis tools and cell-based libraries shows that this acceleration is achieved at a 10 percent increase in area required by MMX and SSE extensions (0.3 percent increase in overall chip area) and 1 percent of total processor power consumption.
机译:多媒体SIMD扩展(例如MMX和AltiVec)加快了媒体处理速度;但是,我们的特征表明,通过SIMD扩展增强的当前通用处理器的属性与媒体程序的访问模式和循环结构不太匹配。我们发现处理器指令流中75%到85%的动态指令是支持SIMD执行单元而不是真实/有用的计算所必需的指令,导致SIMD执行单元的利用率不足(仅峰值的1%到12%) SIMD执行单元的吞吐量已达到)。与专注于开发更多数据级并行性(DLP)相反,我们专注于支持SIMD计算的指令,并在支持的指令流中同时利用细粒度和粗粒度指令级并行性(ILP)。我们提出了MediaBreeze体系结构,该体系结构使用硬件支持来进行有效的地址生成,循环和数据重组(置换,打包/解包,转置等)。我们在多媒体内核上的结果表明,具有MediaBreeze增强功能的具有SIMD扩展功能的2路处理器比具有当前SIMD扩展功能的16路处理器具有更好的性能。对于应用基准测试,具有MediaBreeze增强功能的SIMD扩展的2- / 4路处理器要优于具有SIMD扩展功能的4- / 8路处理器。使用ASIC综合工具和基于单元的库进行的一阶近似表明,这种加速是通过MMX和SSE扩展所需的面积增加10%(芯片总面积增加0.3%)和处理器总功耗的1%来实现的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号