...
首页> 外文期刊>Journal of computational science >Optimized sparse Cholesky factorization on hybrid multicore architectures
【24h】

Optimized sparse Cholesky factorization on hybrid multicore architectures

机译:混合多核架构上的优化稀疏Cholesky分解

获取原文
获取原文并翻译 | 示例

摘要

We present techniques for supernodal sparse Cholesky factorization on a hybrid multicore platform consisting of a multicore CPU and GPU. The techniques are the subtree algorithm, pipelining and multi-threading. The subtree algorithm [15] minimizes PCIe transmissions by storing an entire branch of the elimination tree in the GPU memory (the elimination tree is a tree data structure describing the work-flow of the factorization), and also reduces the total kernel launch time by launching BLAS kernels in batches. The pipelining technique overlaps the execution of GPU kernels and PCIe data transfers. The multithreading technique [17] creates multiple threads for both the CPU and the GPU, to utilize concurrency of the elimination tree. Our experimental results on a platform consisting of an Intel multicore processor along with an Nvidia GPU indicate a significant improvement in performance and energy over CHOLMOD (SuiteSparse 4.5.3), a sparse algorithm, after these techniques are applied. (C) 2018 Elsevier B.V. All rights reserved.
机译:我们介绍了在由多核CPU和GPU组成的混合多核平台上进行超节点稀疏Cholesky分解的技术。这些技术是子树算法,流水线和多线程。子树算法[15]通过将消除树的整个分支存储在GPU内存中来最大程度地减少PCIe传输(消除树是描述因式分解工作流程的树数据结构),并且还减少了总的内核启动时间分批启动BLAS内核。流水线技术与GPU内核和PCIe数据传输的执行重叠。多线程技术[17]为CPU和GPU创建了多个线程,以利用消除树的并发性。我们在由英特尔多核处理器和Nvidia GPU组成的平台上的实验结果表明,在应用了这些技术之后,与稀疏算法CHOLMOD(SuiteSparse 4.5.3)相比,性能和能耗有了显着提高。 (C)2018 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号