首页> 外文OA文献 >Hybrid algorithms for efficient Cholesky decomposition and matrix inverse using multicore CPUs with GPU accelerators

【2h】

Hybrid algorithms for efficient Cholesky decomposition and matrix inverse using multicore CPUs with GPU accelerators

机译：使用具有GPU加速器的多核CPU进行高效Cholesky分解和矩阵逆的混合算法

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The use of linear algebra routines is fundamental to many areas of computational science, yet their implementation in software still forms the main computational bottleneck in many widely used algorithms. In machine learning and computational statistics, for example, the use of Gaussian distributions is ubiquitous, and routines for calculating the Cholesky decomposition, matrix inverse and matrix determinant must often be called many thousands of times for common algorithms, such as Markov chain Monte Carlo. These linear algebra routines consume most of the total computational time of a wide range of statistical methods, and any improvements in this area will therefore greatly increase the overall efficiency of algorithms used in many scientific application areas. The importance of linear algebra algorithms is clear from the substantial effort that has been invested over the last 25 years in producing low-level software libraries such as LAPACK, which generally optimise these linear algebra routines by breaking up a large problem into smaller problems that may be computed independently. The performance of such libraries is however strongly dependent on the specific hardware available. LAPACK was originally developed for single core processors with a memory hierarchy, whereas modern day computers often consist of mixed architectures, with large numbers of parallel cores and graphics processing units (GPU) being used alongside traditional CPUs. The challenge lies in making optimal use of these different types of computing units, which generally have very different processor speeds and types of memory. In this thesis we develop novel low-level algorithms that may be generally employed in blocked linear algebra routines, which automatically optimise themselves to take full advantage of the variety of heterogeneous architectures that may be available. We present a comparison of our methods with MAGMA, the state of the art open source implementation of LAPACK designed specifically for hybrid architectures, and demonstrate up to 400% increase in speed that may be obtained using our novel algorithms, specifically when running commonly used Cholesky matrix decomposition, matrix inverse and matrix determinant routines.

机译：线性代数例程的使用是计算科学许多领域的基础，但是它们在软件中的实现仍然构成许多广泛使用的算法中的主要计算瓶颈。例如，在机器学习和计算统计中，高斯分布的使用无处不在，并且对于诸如马尔可夫链蒙特卡洛之类的通用算法，用于计算Cholesky分解，矩阵逆和矩阵行列式的例程必须经常调用数千次。这些线性代数例程占用大量统计方法的大部分总计算时间，因此，此领域中的任何改进都将大大提高许多科学应用领域中使用的算法的整体效率。线性代数算法的重要性从过去25年中在生产低级软件库（例如LAPACK）上的大量努力中可以清楚地看出，该软件库通常通过将一个大问题分解为一些小问题来优化这些线性代数例程。独立计算。但是，此类库的性能在很大程度上取决于可用的特定硬件。 LAPACK最初是为具有内存层次结构的单核处理器开发的，而现代计算机通常由混合架构组成，大量并行内核和图形处理单元（GPU）与传统CPU一起使用。挑战在于优化利用这些不同类型的计算单元，这些计算单元通常具有非常不同的处理器速度和内存类型。在本文中，我们开发了新颖的低级算法，该算法通常可在阻塞线性代数例程中使用，该算法会自动优化自身以充分利用可能可用的各种异构体系结构。我们将我们的方法与MAGMA进行了比较，MAGMA是专为混合架构设计的LAPACK的最新开源实现，并且展示了使用我们的新颖算法（尤其是在运行常用的Cholesky时）可以将速度提高多达400％矩阵分解，矩阵逆和矩阵行列式例程。

著录项

作者
Macindoe GI;
展开▼
作者单位

展开▼
年度 2013
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. A Dual Heterogeneous Island Genetic Algorithm for Solving Large Size Flexible Flow Shop Scheduling Problems on Hybrid Multicore CPU and GPU Platforms [J] . Luo Jia, El Baz Didier Mathematical Problems in Engineering . 2019,第6期

机译：解决混合多核CPU和GPU平台上的大型柔性流水车间调度问题的双异构岛遗传算法
2. A Dual Heterogeneous Island Genetic Algorithm for Solving Large Size Flexible Flow Shop Scheduling Problems on Hybrid Multicore CPU and GPU Platforms [J] . Jia Luo, Didier El Baz Mathematical Problems in Engineering: Theory, Methods and Applications . 2019,第1期

机译：一种双异构岛遗传算法，用于解决混合多核CPU和GPU平台的大尺寸灵活流店调度问题
3. An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU [J] . Lyakh Dmitry I. Computer physics communications . 2015,第Null期

机译：用于多核CPU，英特尔Xeon Phi和NVIDIA Tesla GPU的有效张传输算法
4. A Hybrid Implementation of Two-Level Domain Decomposition Algorithm for Solving Elliptic Equation on CPU/GPUs [C] . Luo Li, Zhao Yubo, Cai Xiao-Chuan International Conference on Parallel and Distributed Computing, Applications and Technologies . 2012

机译：在CPU / GPU上求解椭圆方程的两级域分解算法的混合实现
5. Efficient Viewshed Computation Algorithms on GPUs and CPUs [D] . Qarah, Faisal F. 2020

机译：GPU和CPU上有效的viewShed计算算法
6. Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines [O] . George Teodoro, Tony Pan, Tahsin Kurc, -1

机译：Hybrid CPU-GPU机器上有效的不规则波前传播算法
7. An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU [O] . Dmitry I. Lyakh 2015

机译：用于多核CPU，英特尔Xeon Phi和NVIDIA Tesla GPU的有效张传输算法
8. Block-Iterative Methods for 3D Constant- Coefficient Stencils on GPUs and Multicore CPUs. [R] . Rodriguez, M., Philip, B., Wang, Z., 2014

机译：GpU和多核CpU上3D恒定系数模板的块迭代方法。

Hybrid algorithms for efficient Cholesky decomposition and matrix inverse using multicore CPUs with GPU accelerators

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅