首页> 外文期刊>Concurrency and Computation >Iterative sparse matrix-vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi-graphics processing unit systems
【24h】

Iterative sparse matrix-vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi-graphics processing unit systems

机译:迭代稀疏矩阵矢量乘法,用于在多图形处理单元系统上通过GF(2)加速块Wiedemann算法

获取原文
获取原文并翻译 | 示例
           

摘要

The block Wiedemann (BW) algorithm is frequently used to solve sparse linear systems over GF(2). Iterative sparse matrix-vector multiplication is the most time-consuming operation. The necessity to accelerate this step is motivated by the application of BW to very large matrices used in the linear algebra step of the number field sieve (NFS) for integer factorization. In this paper, we derive an efficient CUDA implementation of this operation by using a newly designed hybrid sparse matrix format. This leads to speedups between 4 and 8 on a single graphics processing unit (GPU) for a number of tested NFS matrices compared with an optimized multicore implementation. We further present a GPU cluster implementation of the full BW for NFS matrices. A small-sized GPU cluster is able to outperform CPU clusters of larger size for large matrices such as the one obtained from the Kilobit special NFS factorization.
机译:块Wiedemann(BW)算法通常用于解决GF(2)上的稀疏线性系统。迭代稀疏矩阵矢量乘法是最耗时的操作。通过将BW应用于在整数场分解的数字场筛(NFS)的线性代数步骤中使用的非常大的矩阵,激发了加快此步骤的必要性。在本文中,我们通过使用新设计的混合稀疏矩阵格式来导出此操作的有效CUDA实现。与经过优化的多核实现相比,对于许多经过测试的NFS矩阵,这导致单个图形处理单元(GPU)的速度提高4到8。我们进一步介绍了用于NFS矩阵的完整带宽的GPU群集实现。对于大型矩阵(例如从Kilobit特殊NFS分解中获得的矩阵)而言,小型GPU集群能够胜过较大尺寸的CPU集群。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号