首页> 外文会议>Advanced parallel processing technologies >Performance Improvement of Multimedia Kernels by Alleviating Overhead Instructions on SIMD Devices
【24h】

Performance Improvement of Multimedia Kernels by Alleviating Overhead Instructions on SIMD Devices

机译:通过减轻SIMD设备上的开销指令来提高多媒体内核的性能

获取原文
获取原文并翻译 | 示例

摘要

SIMD extension is one of the most common and effective technique to exploit data-level parallelism in today's processor designs. However, the performance of SIMD architectures is limited by some constraints such as mismatch between the storage and the computational formats and using data permutation instructions during vectorization. In our previous work we have proposed two architectural modifications, the extended subwords and the Matrix Register File (MRF) to alleviate the limitations. The extended subwords, uses four extra bits for every byte in a media register and it provides additional parallelism. The MRF allows flexible row-wise as well as column-wise access to the register file and it eliminates data permutation instructions. We have validated the combination of the proposed techniques by studying the performance of some multimedia kernels. In this paper, we analysis each proposed technique separately. In other words, we answer the following questions in this paper. How much of the performance gain is a result of the additional parallelism? and how much is due to the elimination of data permutation instructions? The results show that employing the MRF and extended subwords separately obtains the speedup less than 1 and 1.15, respectively. In other words, our results indicate that using either extended subwords or the MRF techniques is insufficient to eliminate most pack/unpack and rearrangement overhead instructions on SIMD processors. The combination of both techniques, on the other hand, yields much more performance benefits than each technique.
机译:SIMD扩展是当今处理器设计中利用数据级并行性的最常见且有效的技术之一。但是,SIMD体系结构的性能受到一些约束条件的限制,例如存储和计算格式之间的不匹配以及在矢量化过程中使用数据置换指令。在我们以前的工作中,我们提出了两种体系结构修改,即扩展子字和矩阵寄存器文件(MRF),以减轻限制。扩展子字为媒体寄存器中的每个字节使用四个额外的位,并提供了附加的并行性。 MRF允许灵活地按行和按列访问寄存器文件,并且消除了数据置换指令。通过研究某些多媒体内核的性能,我们已经验证了所提出技术的组合。在本文中,我们分别分析了每种提出的技术。换句话说,我们在本文中回答以下问题。额外的并行性可带来多少性能提升?消除数据置换指令有多少呢?结果表明,分别使用MRF和扩展子字可获得的加速比分别小于1和1.15。换句话说,我们的结果表明,使用扩展子字或MRF技术不足以消除SIMD处理器上的大多数打包/解包和重排开销指令。另一方面,与每种技术相比,两种技术的组合产生了更多的性能优势。

著录项

  • 来源
  • 会议地点 Rapperswil(CH);Rapperswil(CH)
  • 作者单位

    Computer Engineering Laboratory,Delft University of Technology, 2628 CD Delft, The Netherlands Department of Computer Engineering, Faculty of Engineering,University of Guilan, Rasht, Iran;

    Computer Engineering Laboratory,Delft University of Technology, 2628 CD Delft, The Netherlands;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 理论、方法;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号