Adapting communication-avoiding LU and QR factorizations to multicore architectures

机译：使避免通信的LU和QR因式分解适应多核体系结构

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we study algorithms for performing the LU and QR factorizations of dense matrices. Recently, two communication optimal algorithms have been introduced for distributed memory architectures, referred to as communication avoiding CALU and CAQR. In this paper we discuss two algorithms based on CAQR and CALU that are adapted to multicore architectures. They combine ideas to reduce communication from communication avoiding algorithms with asynchronism and dynamic task scheduling. For matrices that are tall and skinny, that is, they have many more rows than columns, the two algorithms outperform the corresponding algorithms from Intel MKL vendor library on a dual-socket, quad-core machine based on Intel Xeon EMT64 processor and on a four-socket, quad-core machine based on AMD Opteron processor. For these matrices, multithreaded CALU outperforms the corresponding routine dgetrf from Intel MKL library up to a factor of 2.3 and the corresponding routine dgetrf from ACML library up to a factor of 5, while multithreaded CAQR outperforms by a factor of 5.3 the corresponding dgeqrf routine from MKL library.

机译：在本文中，我们研究了执行密集矩阵的LU和QR分解的算法。最近，针对分布式存储体系结构引入了两种通信最佳算法，称为避免CALU和CAQR的通信。在本文中，我们讨论了两种适用于多核体系结构的基于CAQR和CALU的算法。他们结合了各种想法，以减少通信避免算法与异步和动态任务调度之间的通信。对于又高又瘦的矩阵，也就是说，它们的行多于列，这两种算法在基于Intel Xeon EMT64处理器的双插槽，四核计算机上以及在基于AMD Opteron处理器的四路四核计算机。对于这些矩阵，多线程CALU的性能优于Intel MKL库的相应例程dgetrf的2.3倍，而ACML库中的相应例程dgetrf的性能最高达5的系数，而多线程CAQR的性能优于5.3中的相应的dgeqrf例程MKL库。

著录项

来源
《2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS)》|2010年|p.1-10|共10页
会议地点 Atlanta GA(US)
作者
Donfack S.; Grigori L.; Gupta A.K.;
展开▼
作者单位

INRIA Saclay-Ile de France, Univ. Paris-Sud 11, Orsay, France;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类 TP311.133;
关键词
LU and QR factorizations; communication avoiding algorithms; multicore architectures;

机译：LU和QR分解;避免通信算法;多核体系结构;

相似文献

外文文献
中文文献
专利

1. Parallel tiled QR factorization for multicore architectures [J] . Alfredo Buttari, Julien Langou, Jakub Kurzak, Concurrency and Computation . 2008,第13期

机译：多核架构的并行平铺QR分解
2. A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems [J] . Sao Piyush, Li Xiaoye S., Vuduc Richard Journal of Parallel and Distributed Computing . 2019,第SEPa期

机译：异构系统上用于稀疏LU分解的避免通信的3D算法
3. A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems [J] . Sao Piyush, Li Xiaoye S., Vuduc Richard Journal of Parallel and Distributed Computing . 2019,第Sepa期

机译：异构系统稀疏LU分解的通信3D算法
4. Adapting communication-avoiding LU and QR factorizations to multicore architectures [C] . Donfack Simplice, Grigori Laura, Gupta Alok Kumar 2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) . 2010

机译：使避免通信的LU和QR因式分解适应多核体系结构
5. Performance Optimization for Sparse Matrix Factorization Algorithms on Hybrid Multicore Architectures [D] . Tang, Meng. 2020

机译：混合多核架构上稀疏矩阵分解算法的性能优化
6. Evolutionary profiles from the QR factorization of multiple sequence alignments [O] . Anurag Sethi, Patrick ODonoghue, Zaida Luthey-Schulten 2005

机译：来自多个序列比对的QR分解的进化谱
7. Scalable tile communication-avoiding QR factorization on multicore cluster systems [O] . Fengguang Song, Hatem Ltaief, Bilel Hadri, 2014

机译：可扩展的磁贴通信 - 避免多核集群系统上的QR分解
8. Automatic Blocking of QR and LU Factorizations for Locality [R] . Yi, Q., Kennedy, K., You, H., 2004

机译：自动阻止局部性的QR和LU因子分解

Adapting communication-avoiding LU and QR factorizations to multicore architectures

摘要

著录项

相似文献

相关主题

期刊订阅