首页> 外文会议>21st annual symposium on parallelism in algorithms and architectures 2009 >Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks
【24h】

Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks

机译:使用压缩稀疏块的并行稀疏矩阵向量和矩阵转置向量乘法

获取原文

摘要

This paper introduces a storage format for sparse matrices, called compressed sparse blocks (CSB), which allows both Ax and A,x to be computed efficiently in parallel, where A is an n×n sparse matrix with nnzen nonzeros and x is a dense n-vector. Our algorithms use Θ(nnz) work (serial running time) and Θ(√nlgn) span (critical-path length), yielding a parallelism of Θ(nnz/√nlgn), which is amply high for virtually any large matrix. The storage requirement for CSB is the same as that for the more-standard compressed-sparse-rows (CSR) format, for which computing Ax in parallel is easy but A,x is difficult. Benchmark results indicate that on one processor, the CSB algorithms for Ax and A,x run just as fast as the CSR algorithm for Ax, but the CSB algorithms also scale up linearly with processorsuntil limited by off-chip memory bandwidth.
机译:本文介绍了一种稀疏矩阵的存储格式,称为压缩稀疏块(CSB),它允许有效地并行计算Ax和A,x,其中A是nnn个非零的n×n稀疏矩阵,x是一个稠密的正向量。我们的算法使用Θ(nnz)功(串行运行时间)和Θ(√nlgn)跨度(关键路径长度),产生Θ(nnz /√nlgn)的并行度,对于几乎任何大型矩阵而言,该并行度都很高。 CSB的存储要求与更标准的压缩稀疏行(CSR)格式的存储要求相同,对于这种格式,并行计算Ax很容易,但是A,x却很困难。基准测试结果表明,在一个处理器上,用于Ax和A,x的CSB算法的运行速度与用于Ax的CSR算法的运行速度一样快,但是CSB算法也随着处理器的扩展而线性扩展,直到受到片外存储器带宽的限制。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号