...
首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Recursive array layouts and fast matrix multiplication
【24h】

Recursive array layouts and fast matrix multiplication

机译:递归数组布局和快速矩阵乘法

获取原文
           

摘要

The performance of both serial and parallel implementations of matrix multiplication is highly sensitive to memory system behavior. False sharing and cache conflicts cause traditional column-major or row-major array layouts to incur high variability in memory system performance as matrix size varies. This paper investigates the use of recursive array layouts to improve performance and reduce variability. Previous work on recursive matrix multiplication is extended to examine several recursive array layouts and three recursive algorithms: standard matrix multiplication and the more complex algorithms of Strassen (1969) and Winograd. While recursive layouts significantly outperform traditional layouts (reducing execution times by a factor of 1.2-2.5) for the standard algorithm, they offer little improvement for Strassen's and Winograd's algorithms. For a purely sequential implementation, it is possible to reorder computation to conserve memory space and improve performance between 10 percent and 20 percent. Carrying the recursive layout down to the level of individual matrix elements is shown to be counterproductive; a combination of recursive layouts down to canonically ordered matrix tiles instead yields higher performance. Five recursive layouts with successively increasing complexity of address computation are evaluated and it is shown that addressing overheads can be kept in control even for the most computationally demanding of these layouts.
机译:矩阵乘法的串行和并行实现的性能对内存系统行为高度敏感。错误共享和高速缓存冲突会导致传统的列主阵列或行主阵列布局随着矩阵大小的变化而在内存系统性能方面引起很大的变化。本文研究了递归数组布局的使用,以提高性能并减少可变性。递归矩阵乘法的先前工作已扩展为检查几种递归数组布局和三种递归算法:标准矩阵乘法以及Strassen(1969)和Winograd的更复杂的算法。尽管对于标准算法,递归布局明显优于传统布局(将执行时间减少了1.2-2.5倍),但对于Strassen和Winograd的算法而言,它们几乎没有改进。对于纯顺序实现,可以对计算重新排序以节省内存空间并在10%到20%之间提高性能。将递归布局降低到单个矩阵元素的水平被证明是适得其反的。递归布局到规范排序的矩阵图块的组合反而会产生更高的性能。评估了五个递归布局,这些递归布局的地址计算复杂度不断提高,并且显示出即使对于这些布局中最需要计算的情况,也可以控制寻址开销。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号