An Implementation of Block Conjugate Gradient Algorithm on CPU-GPU Processors

机译：块共轭梯度算法在CPU-GPU处理器上的实现

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we investigate the implementation of the Block Conjugate Gradient (BCG) algorithm on CPU-GPU processors. By analyzing the performance of various matrix operations in BCG, we identify the main performance bottleneck in constructing new search direction matrices. Replacing the QR decomposition by eigendecomposition of a small matrix remedies the problem by reducing the computational cost of generating orthogonal search directions. Moreover, a hybrid (offload) computing scheme is designed to enables the BCG implementation to handle linear systems with large, sparse coefficient matrices that cannot fit in the GPU memory. The hybrid scheme offloads matrix operations to GPU processors while helps hide the CPU-GPU memory transaction overhead. We compare the performance of our BCG implementation with the one on CPU with Intel Xeon Phi coprocessors using the automatic offload mode. With sufficient number of right hand sides, the CPU-GPU implementation of BCG can reach speedup of 2.61 over the CPU-only implementation, which is significantly higher than that of the CPU-Intel Xeon Phi implementation.

机译：在本文中，我们研究了CPU-GPU处理器上块共轭梯度（BCG）算法的实现。通过分析BCG中各种矩阵运算的性能，我们确定了构造新的搜索方向矩阵时的主要性能瓶颈。通过小矩阵的本征分解代替QR分解通过减少生成正交搜索方向的计算成本来解决该问题。此外，设计了一种混合（卸载）计算方案，以使BCG实现能够处理线性系统，这些线性系统具有无法容纳在GPU内存中的大而稀疏的系数矩阵。混合方案将矩阵运算转移给GPU处理器，同时有助于隐藏CPU-GPU内存事务开销。我们将使用自动卸载模式的BCG实施与采用Intel Xeon Phi协处理器的CPU实施的性能进行比较。有了足够的右侧，BCG的CPU-GPU实现可以比仅CPU的实现达到2.61的加速，这明显高于CPU-Intel Xeon Phi的实现。

著录项

来源
《Co-HPC 2014: 1st International Workshop on Hardware-Software Co-design for High Performance Computing, Held in conjunction with C14: The International Conference for High Performance Computing, Networking, Storage and Analysis》|2014年|72-77|共6页
会议地点 New Orleans LA(US)
作者
Hao Ji; Sosonkina Masha; Yaohang Li;
展开▼
作者单位

Dept. of Comput. Sci., Old Dominion Univ., Norfolk, VA, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
conjugate gradient methods; graphics processing units; multiprocessing systems; performance evaluation; BCG algorithm; CPU-GPU memory transaction; CPU-GPU processors; CPU-Intel Xeon Phi coprocessors; QR decomposition; automatic offload mode; block conjugate gradient algorithm; hybrid computing scheme; linear systems; offload computing scheme; sparse coefficient matrices; Clocks; Coprocessors; Data transfer; Graphics processing units; Matrix decomposition; Sparse matrices; Block Conjugate Gradient; Multi-core CPU; Gr;

机译：共轭梯度法;图形处理单元;多处理系统;性能评估; BCG算法; CPU-GPU内存事务; CPU-GPU处理器; CPU-Intel Xeon Phi协处理器; QR分解;自动卸载模式;块共轭梯度算法;混合计算方案线性系统流量计算方案稀疏系数矩阵时钟协处理器数据传输图形处理单元矩阵分解稀疏矩阵块共轭梯度多核CPU； GR;
入库时间 2022-08-26 14:07:11

相似文献

外文文献
中文文献
专利

1. A multi-grained distributed implementation of the parallel Block Conjugate Gradient algorithm [J] . A. Murli, L. DAmore, Laccetti, Concurrency, practice and experience . 2010,第15期

机译：并行块共轭梯度算法的多粒度分布式实现
2. Convergence conditions, line search algorithms and trust region implementations for the Polak-Ribière conjugate gradient method [J] . L. Grippo, S. Lucidi Optimization methods & software . 2005,第1期

机译：Polak-Ribière共轭梯度法的收敛条件，线搜索算法和信赖域实现
3. An increasing-angle property of the conjugate gradient method and the implementation of large-scale minimization algorithms with line searches [J] . Yu-Hong Dai, Jose Mario Martinez, Jin-Yun Yuan Numerical linear algebra with applications . 2003,第4期

机译：共轭梯度法的渐增角性质和大规模最小化线搜索算法的实现
4. Complex block orthogonal gradient adaptive-based algorithm with conjugate gradient principle [C] . Suchada Sitjongsataporn, Aphichata Thongrak International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology . 2016

机译：共轭梯度原理的基于复杂块正交梯度自适应算法
5. Algorithmic and software system support to accelerate data processing in CPU-GPU hybrid computing environments. [D] . Wang, Kaibo. 2015

机译：算法和软件系统支持可加速CPU-GPU混合计算环境中的数据处理。
6. Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards [O] . Francesc Massanes, Marie Cadennes, Jovan G. Brankov -1

机译：计算的统一设备架构实现块匹配算法的多个图形处理单元卡
7. The behavior of conjugate gradient algorithms on a multivector processor with a hierarchical memory [O] . Meier Ulrike, Sameh Ahmed 1988

机译：共轭梯度算法在具有分层内存的多向量处理器上的行为
8. Behaviour of Conjugate Gradient Based Algorithms on a Multi-Vector Processor with a Memory Hierarchy [R] . Jalby, W. , Meier, U. , Sameh, A. 1986

机译：基于共轭梯度的算法在具有记忆层次结构的多向量处理器上的行为

An Implementation of Block Conjugate Gradient Algorithm on CPU-GPU Processors

摘要

著录项

相似文献

相关主题

期刊订阅