We consider the load-balanced multiplication of a large sparse matrix with a large sequence of vectors, on parallel computers. Due to the associated computational and inter-node communication challenges, we propose a method that combines fast load-balanced work allocation with efficient message passing implementations. The performance of the proposed method was evaluated on benchmark matrices as well as on synthetically generated matrix data. We compare our proposed allocation solution with previous research work. It is shown that, by using our approach, a tangible improvement over prior work can be obtained, particularly for very sparse and skewed matrices.
展开▼