...
首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Accelerating Sparse Cholesky Factorization on Sunway Manycore Architecture
【24h】

Accelerating Sparse Cholesky Factorization on Sunway Manycore Architecture

机译:在Sunway Manycore架构上加速稀疏的Cholesky分解

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

To improve the performance of sparse Cholesky factorization, existing research divides the adjacent columns of the sparse matrix with the same nonzero patterns into supernodes for parallelization. However, due to the various structures of sparse matrices, the computation of the generated supernodes varies significantly, and thus hard to optimize when computed by dense matrix kernels. Therefore, how to efficiently map sparse Choleksy factorization to the emerging architectures, such as Sunway many-core processor, remains an active research direction. In this article, we propose swCholesky, which is a highly optimized implementation of sparse Cholesky factorization on Sunway processor. Specifically, we design three kernel task queues and a dense matrix library to dynamically adapt to the kernel characteristics and architecture features. In addition, we propose an auto-tuning mechanism to search for the optimal settings of the important parameters in swCholesky. Our experiments show that swCholesky achieves better performance than state-of-the-art implementations.
机译:为了提高稀疏Cholesky分解的性能,现有研究将具有相同非零模式的稀疏矩阵的相邻列划分为超节点以进行并行化。然而,由于稀疏矩阵的各种结构,生成的超节点的计算差异很大,因此在通过密集矩阵内核进行计算时难以优化。因此,如何有效地将稀疏的Choleksy因子分解映射到诸如Sunway多核处理器之类的新兴架构,仍然是活跃的研究方向。在本文中,我们提出了swCholesky,它是Sunway处理器上稀疏Cholesky分解的高度优化实现。具体来说,我们设计了三个内核任务队列和一个密集矩阵库,以动态适应内核特征和体系结构特征。此外,我们提出了一种自动调整机制来搜索swCholesky中重要参数的最佳设置。我们的实验表明,swCholesky的性能优于最新的实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号