首页> 外文会议> >Fast and scalable parallel algorithms for matrix chain product and matrix powers on distributed memory systems
【24h】

Fast and scalable parallel algorithms for matrix chain product and matrix powers on distributed memory systems

机译:快速且可扩展的并行算法,用于分布式存储系统上的矩阵链乘积和矩阵功率

获取原文

摘要

Given N matrices A/sub 1/, A/sub 2/,..., A/sub N/ of size N/spl times/N, the matrix chain product problem is to compute A/sub 1//spl times/A/sub 2//spl times//spl middot//spl middot//spl middot/A/sub N/. Given an N/spl times/N matrix A, the matrix powers problem is to calculate the first N powers of A, i.e., A, A/sup 2/A/sup 3/,..., A/sup N/. We consider distributed memory systems (DMS) with p processors that can support one-to-one communications in O(T(p)) time. Assume that the time complexity of the best known sequential algorithm for matrix multiplication is O(N/sup /spl alpha//), where /spl alpha/>2.3755. Let p be arbitrarily chosen in the range 1/spl les/p/spl les/N/sup /spl alpha/+1//log N. We show that the two problems can be solved on a p-processor DMS in T/sub chain/(N,p)=O(N/sup /spl alpha/+1//p+T(p)(N/sup 2(1+1//spl alpha/)//p/sup 2// /sup /spl alpha//(log p/N)/sup 1-2//spl alpha//+log(p log N/N/sup /spl alpha//) log N)) and T/sub power/(N,p)=0(N/sup /spl alpha/+1//p+T(p)(N/sup 2(1+1//spl alpha/)//p/sup 2// /sup /spl alpha//(log p/log N)/sup 1-2//spl alpha//+(log N)/sup 2/)) times, respectively. We also give instantiation of the above results in distributed memory parallel computers and DMS with hypercubic networks, and show that our parallel algorithms are either fully scalable or highly scalable.
机译:给定N个矩阵A / sub 1 /,A / sub 2 /,...,A / sub N /的大小N / spl次/ N,矩阵链乘积问题就是计算A / sub 1 // spl次/ A / sub 2 // spl次// spl middot // spl middot // spl middot / A / sub N /。给定一个N / spl次/ N个矩阵A,矩阵幂问题就是计算A的前N个幂,即A,A / sup 2 / A / sup 3 /,...,A / sup N /。我们考虑具有p个处理器的分布式存储系统(DMS),它们可以支持O(T(p))时间的一对一通信。假设最著名的矩阵乘法顺序算法的时间复杂度为O(N / sup / spl alpha //),其中/ spl alpha /> 2.3755。令p在1 / spl les / p / spl les / N / sup / spl alpha / + 1 // log N范围内任意选择。我们证明可以在T /子链/(N,p)= O(N / sup / spl alpha / + 1 // p + T(p)(N / sup 2(1 + 1 // spl alpha /)// p / sup 2 / / / sup / spl alpha //(log p / N)/ sup 1-2 // spl alpha // + log(p log N / N / sup / spl alpha //)log N))和T / sub电源/(N,p)= 0(N / sup / spl alpha / + 1 // p + T(p)(N / sup 2(1 + 1 // spl alpha /)// p / sup 2 // / / sup / spl alpha //(log p / log N)/ sup 1-2 // spl alpha // +(log N)/ sup 2 /))次。我们还给出了在具有超立方网络的分布式内存并行计算机和DMS中上述结果的实例,并表明我们的并行算法是完全可伸缩的或高度可伸缩的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号