In this paper we report on the development of an efficient and portable implementation of Strassen's matrix multiplication algorithm for matrices of arbitrary size. Our technique for defining the criterion which stops the recursions is more detailed than those generally used, thus allowing enhanced performance for a larger set of input sizes. In addition, we deal with odd matrix dimensions using a method whose usefulness had previously been in question and had not so far been demonstrated. Our memory requirements have also been reduced, in certain cases by 40 to more than 70 percent over other similar implementations. We measure performance of our code on the IBM RS/6000, CRAY YMP C90, and CRAY T3D single processor, and offer comparisons to other codes. Finally, we demonstrate the usefulness of our implementation by using it to perform the matrix multiplications in a large application code.
在本文中,我们报告了针对任意大小矩阵的Strassen矩阵乘法算法的高效,可移植实现的发展情况。我们定义停止递归准则的技术比通常使用的技术更加详细,因此可以为更大的输入大小集提供增强的性能。另外,我们使用一种方法来处理奇数矩阵维,该方法的用途以前是有问题的,到目前为止尚未得到证实。我们的内存需求也已降低,在某些情况下,与其他类似的实现相比减少了40%至70%以上。我们在IBM RS / 6000,CRAY YMP C90和CRAY T3D单处理器上评估代码的性能,并与其他代码进行比较。最后,我们通过在大型应用程序代码中执行矩阵乘法来演示实现的有用性。 P>
matrix multiplication, Strassen s algorithm, Winograd variant, Level 3 BLAS;
机译:混合并行度对Strassen和Winograd矩阵乘法算法的并行实现的影响
机译:Strassen矩阵乘法算法的云友好通信优化实现
机译:Strassen矩阵乘法算法在蠕虫路由全端口二维环面网络中的并行实现
机译:Strassen和Winograd矩阵乘法算法的顶层步骤的混合并行实现
机译:所有递归矩阵乘法算法的I / O复杂度下界
机译:量子超并行矩阵乘法算法
机译:混合并行性对斯特森和WinoGrad矩阵乘法算法的并行实现的影响