首页> 外文会议>International conference on parallel processing and applied mathematics;PPAM 2010 >Adaptation of Double-Precision Matrix Multiplication to the Cell Broadband Engine Architecture
【24h】

Adaptation of Double-Precision Matrix Multiplication to the Cell Broadband Engine Architecture

机译:双精度矩阵乘法对小区宽带引擎架构的适应

获取原文

摘要

This paper presents an approach to adaptation of the double-precision matrix multiplication to the architecture of Cell processors. The algorithm used for the adaptation on a single SPE is based on C = C + A*B operation performed for matrices of size 64x64; these matrices are further divided into smaller submatrices which correspond to micro-kernel operations. Our approach is based on a performance model which is constructed as a function of submatrix size. The model accounts for such factors as size of local storage, number of registers, properties of double-precision operations, balance between pipelines, etc. This approach allows us to take into consideration properties of the first generation of Cell processors and its successor - PowerXCell 8i. This adaptation is followed by an optimization phase which includes loop transformations, kernel implementation with SIMD instructions, and other transformations necessary to achieve balance between even and odd pipelines. Finally we present hand-tunings performed with the IBM Assembly Visualizer tool. The proposed adaptation and optimizations allow us to achieve about 96% of the peak performance.
机译:本文提出了一种使双精度矩阵乘法适应Cell处理器体系结构的方法。用于单个SPE的自适应算法基于对大小为64x64的矩阵执行的C = C + A * B操作;这些矩阵进一步分为较小的子矩阵,这些子矩阵与微内核操作相对应。我们的方法基于性能模型,该模型是根据子矩阵大小构建的。该模型考虑了诸如本地存储大小,寄存器数量,双精度操作的属性,流水线之间的平衡等因素。这种方法使我们能够考虑第一代Cell处理器及其后继产品PowerXCell的属性。 8i。适应之后是优化阶段,该阶段包括循环转换,使用SIMD指令的内核实现以及实现偶数和奇数流水线之间的平衡所必需的其他转换。最后,我们介绍了使用IBM Assembly Visualizer工具执行的手动调整。提出的自适应和优化功能使我们可以实现约96%的峰值性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号