首页> 外文OA文献 >Hybrid algorithms for efficient Cholesky decomposition and matrix inverse using multicore CPUs with GPU accelerators
【2h】

Hybrid algorithms for efficient Cholesky decomposition and matrix inverse using multicore CPUs with GPU accelerators

机译:使用具有GPU加速器的多核CPU进行高效Cholesky分解和矩阵逆的混合算法

摘要

The use of linear algebra routines is fundamental to many areas of computational science, yet their implementation in software still forms the main computational bottleneck in many widely used algorithms. In machine learning and computational statistics, for example, the use of Gaussian distributions is ubiquitous, and routines for calculating the Cholesky decomposition, matrix inverse and matrix determinant must often be called many thousands of times for common algorithms, such as Markov chain Monte Carlo. These linear algebra routines consume most of the total computational time of a wide range of statistical methods, and any improvements in this area will therefore greatly increase the overall efficiency of algorithms used in many scientific application areas. The importance of linear algebra algorithms is clear from the substantial effort that has been invested over the last 25 years in producing low-level software libraries such as LAPACK, which generally optimise these linear algebra routines by breaking up a large problem into smaller problems that may be computed independently. The performance of such libraries is however strongly dependent on the specific hardware available. LAPACK was originally developed for single core processors with a memory hierarchy, whereas modern day computers often consist of mixed architectures, with large numbers of parallel cores and graphics processing units (GPU) being used alongside traditional CPUs. The challenge lies in making optimal use of these different types of computing units, which generally have very different processor speeds and types of memory. In this thesis we develop novel low-level algorithms that may be generally employed in blocked linear algebra routines, which automatically optimise themselves to take full advantage of the variety of heterogeneous architectures that may be available. We present a comparison of our methods with MAGMA, the state of the art open source implementation of LAPACK designed specifically for hybrid architectures, and demonstrate up to 400% increase in speed that may be obtained using our novel algorithms, specifically when running commonly used Cholesky matrix decomposition, matrix inverse and matrix determinant routines.
机译:线性代数例程的使用是计算科学许多领域的基础,但是它们在软件中的实现仍然构成许多广泛使用的算法中的主要计算瓶颈。例如,在机器学习和计算统计中,高斯分布的使用无处不在,并且对于诸如马尔可夫链蒙特卡洛之类的通用算法,用于计算Cholesky分解,矩阵逆和矩阵行列式的例程必须经常调用数千次。这些线性代数例程占用大量统计方法的大部分总计算时间,因此,此领域中的任何改进都将大大提高许多科学应用领域中使用的算法的整体效率。线性代数算法的重要性从过去25年中在生产低级软件库(例如LAPACK)上的大量努力中可以清楚地看出,该软件库通常通过将一个大问题分解为一些小问题来优化这些线性代数例程。独立计算。但是,此类库的性能在很大程度上取决于可用的特定硬件。 LAPACK最初是为具有内存层次结构的单核处理器开发的,而现代计算机通常由混合架构组成,大量并行内核和图形处理单元(GPU)与传统CPU一起使用。挑战在于优化利用这些不同类型的计算单元,这些计算单元通常具有非常不同的处理器速度和内存类型。在本文中,我们开发了新颖的低级算法,该算法通常可在阻塞线性代数例程中使用,该算法会自动优化自身以充分利用可能可用的各种异构体系结构。我们将我们的方法与MAGMA进行了比较,MAGMA是专为混合架构设计的LAPACK的最新开源实现,并且展示了使用我们的新颖算法(尤其是在运行常用的Cholesky时)可以将速度提高多达400%矩阵分解,矩阵逆和矩阵行列式例程。

著录项

  • 作者

    Macindoe GI;

  • 作者单位
  • 年度 2013
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号