首页> 外文会议>International conference on algorithms and architectures for parallel processing >GPU Acceleration of Finding Maximum Eigenvalue of Positive Matrices
【24h】

GPU Acceleration of Finding Maximum Eigenvalue of Positive Matrices

机译:查找正矩阵最大特征值的GPU加速

获取原文

摘要

Matrix eigenvalue theory has become an important analysis tool in scientific computing. Sometimes, people do not need to find all eigenvalues but only the maximum eigenvalue. Existing algorithms of finding the maximum eigenvalue of matrices are implemented sequentially. With the increasing of the orders of matrices, the workload of calculation is getting heavier. Therefore, traditional sequential methods are unable to meet the need of fast calculation for large matrices. This paper proposes a parallel algorithm named PA-ST to find the maximum eigenvalue of positive matrices by using similarity transformation which is implemented by CUDA (Computer Unified Device Architecture) on GPU (Graphic Process Unit). To the best of our knowledge, this is the first CUDA based parallel algorithm of calculating maximum eigenvalue of matrices. In order to improve the performance, optimization techniques are applied in this paper such as using the shared memory rather than the global memory to improve the speed of computation, avoiding bank conflicts by setting the span index, satisfying the principle of coalesced memory access, and by using single-precision floating-point arithmetic and the pinned memory to reduce the copy operation and obtain higher data transfer bandwidth between the host and the GPU device. The experimental results show that the similarity transformation technique can significantly shorten the running time compared to the sequential algorithm and the speedup ratio is nearly stable when the number of iterations increases. As the matrix order increases, the running time of the sequential algorithm and PA-ST increases correspondingly. Experiments also show that the speedup ratio of the PA-ST is between 2.85 and 35.028.
机译:矩阵特征值理论已成为科学计算中的重要分析工具。有时,人们不需要查找所有特征值,而只需查找最大特征值。查找矩阵最大特征值的现有算法是顺序执行的。随着矩阵阶数的增加,计算工作量越来越重。因此,传统的顺序方法无法满足大型矩阵快速计算的需求。本文提出了一种并行算法,称为PA-ST,它通过使用CUDA(计算机统一设备体系结构)在GPU(图形处理单元)上实现的相似性转换来找到正矩阵的最大特征值。据我们所知,这是第一个基于CUDA的并行算法,用于计算矩阵的最大特征值。为了提高性能,本文采用了一些优化技术,例如使用共享内存而不是全局内存来提高计算速度,通过设置跨度索引来避免库冲突,满足合并的内存访问原则,以及通过使用单精度浮点算法和固定内存来减少复制操作,并获得主机与GPU设备之间更高的数据传输带宽。实验结果表明,与顺序算法相比,相似度转换技术可以显着缩短运行时间,并且当迭代次数增加时,加速比几乎是稳定的。随着矩阵阶数的增加,顺序算法和PA-ST的运行时间也相应增加。实验还表明,PA-ST的加速比在2.85至35.028之间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号