首页> 外文会议>Tutorial on High Performance Numerical Tools for the Development and Scalability of High-End Computer Applications Conference >Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors
【24h】

Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors

机译:用柱循环分布在多核和GPU处理器上使用柱枢转的QR分解的并行化

获取原文

摘要

The QR decomposition with column pivoting (QRP) of a matrix is widely used for rank revealing. The performance of LAPACK implementation (DGEQP3) of the Householder QRP algorithm is limited by Level 2 BLAS operations required for updating the column norms. In this paper, we propose an implementation of the QRP algorithm using a distribution of the matrix columns in a round-robin fashion for better data locality and parallel memory bus utilization on multicore architectures. Our performance results show a 60% improvement over the routine DGEQP3 of Intel MKL (version 10.3) on a 12 core Intel Xeon X5670 machine. In addition, we show that the same data distribution is also suitable for general purpose GPU processors, where our implementation obtains up to 90 GFlops on a NVIDIA GeForce GTX480. This is about 2 times faster than the QRP implementation of MAGMA (version 1.2.1). Topics. Parallel and Distributed Computing.
机译:矩阵的柱枢转(QRP)的QR分解广泛用于排名。 HAPACK实现(DGEQP3)的QRP算法的性能受更新列规范所需的2级BLAS操作的限制。在本文中,我们提出了使用矩阵列的分布以循环方式的分布来实现QRP算法,以获得多核架构上的更好的数据局势和并行存储器总线利用。我们的绩效结果显示了Intel MKL(版本10.3)的常规DGEQ3在12 Core Intel Xeon X5670机器上的60%改进。此外,我们表明相同的数据分布也适用于通用GPU处理器,我们的实现在NVIDIA GeForce GTX480上获得高达90 GFLOPS。这比Magma的QRP执行速度快2倍(版本1.2.1)。话题。并行和分布式计算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号