Via novel path-routing techniques we prove a lower bound on the I/O-complexity of all recursive matrix multiplication algorithms computed in serial or in parallel and show that it is tight for all square and near-square matrix multiplication algorithms. Previously, tight lower bounds were known only for the classical theta (n3) matrix multiplication algorithm and those similar to Strassen's algorithm that lack multiple vertex copying. We first prove tight lower bounds on the I/O-complexity of Strassen-like algorithms, under weaker assumptions, by constructing a routing of paths between the inputs and outputs of sufficiently small subcomputations in the algorithm's CDAG. We then further extend this result to all recursive divide-and-conquer matrix multiplication algorithms, and show that our lower bound is optimal for algorithms formed from square and nearly square recursive steps. This requires combining our new path-routing approach with a secondary routing based on the Loomis-Whitney Inequality technique used to prove the optimal I/O-complexity lower bound for classical matrix multiplication.
展开▼