Adapting communication-avoiding LU and QR factorizations to multicore architectures

机译：调整通信 - 避免对多核架构的LU和QR因子化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we study algorithms for performing the LU and QR factorizations of dense matrices. Recently, two communication optimal algorithms have been introduced for distributed memory architectures, referred to as communication avoiding CALU and CAQR. In this paper we discuss two algorithms based on CAQR and CALU that are adapted to multicore architectures. They combine ideas to reduce communication from communication avoiding algorithms with asynchronism and dynamic task scheduling. For matrices that are tall and skinny, that is, they have many more rows than columns, the two algorithms outperform the corresponding algorithms from Intel MKL vendor library on a dual-socket, quad-core machine based on Intel Xeon EMT64 processor and on a four-socket, quad-core machine based on AMD Opteron processor. For these matrices, multithreaded CALU outperforms the corresponding routine dgetrf from Intel MKL library up to a factor of 2.3 and the corresponding routine dgetrf from ACML library up to a factor of 5, while multithreaded CAQR outperforms by a factor of 5.3 the corresponding dgeqrf routine from MKL library.

机译：在本文中，我们研究用于执行致密矩阵的LU和QR案件的算法。最近，已经引入了两个通信最佳算法用于分布式内存架构，称为避免Calu和CAQR的通信。在本文中，我们讨论了基于CAQR和CALU的两种算法，该算法适用于多核架构。它们结合了思想来减少与异步和动态任务调度的通信避免算法的通信。对于高且瘦的矩阵，即它们的行多于列，两种算法优于基于英特尔Xeon EMT64处理器的双套接字，四核机器上的英特尔MKL供应商库的相应算法。基于AMD Opteron处理器的四套筒，四核机。对于这些矩阵，多线程，Calu从Intel MKL库中的相应例程DGERF从Intel MKL库中达到2.3因子，并且来自ACML库的相应例程DGERF高达5倍，而多线程CAQR优于5.3的相应DGEQRF常规MKL图书馆。

著录项

来源
《IEEE International Symposium on Parallel Distributed Processing》|2010年||共10页
会议地点
作者
Donfack S.; Grigori L.; Gupta A.K.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.138-53;
关键词
LU and QR factorizations; communication avoiding algorithms; multicore architectures;

机译：鲁和QR accipiliations;通信避免算法;多核架构;

相似文献

外文文献
中文文献
专利

1. Parallel tiled QR factorization for multicore architectures [J] . Alfredo Buttari, Julien Langou, Jakub Kurzak, Concurrency and Computation . 2008,第13期

机译：多核架构的并行平铺QR分解
2. A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems [J] . Sao Piyush, Li Xiaoye S., Vuduc Richard Journal of Parallel and Distributed Computing . 2019,第SEPa期

机译：异构系统上用于稀疏LU分解的避免通信的3D算法
3. A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems [J] . Sao Piyush, Li Xiaoye S., Vuduc Richard Journal of Parallel and Distributed Computing . 2019,第Sepa期

机译：异构系统稀疏LU分解的通信3D算法
4. Adapting communication-avoiding LU and QR factorizations to multicore architectures [C] . Donfack Simplice, Grigori Laura, Gupta Alok Kumar 2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) . 2010

机译：使避免通信的LU和QR因式分解适应多核体系结构
5. Performance Optimization for Sparse Matrix Factorization Algorithms on Hybrid Multicore Architectures [D] . Tang, Meng. 2020

机译：混合多核架构上稀疏矩阵分解算法的性能优化
6. Evolutionary profiles from the QR factorization of multiple sequence alignments [O] . Anurag Sethi, Patrick ODonoghue, Zaida Luthey-Schulten 2005

机译：来自多个序列比对的QR分解的进化谱
7. Scalable tile communication-avoiding QR factorization on multicore cluster systems [O] . Fengguang Song, Hatem Ltaief, Bilel Hadri, 2014

机译：可扩展的磁贴通信 - 避免多核集群系统上的QR分解
8. Automatic Blocking of QR and LU Factorizations for Locality [R] . Yi, Q., Kennedy, K., You, H., 2004

机译：自动阻止局部性的QR和LU因子分解

Adapting communication-avoiding LU and QR factorizations to multicore architectures

摘要

著录项

相似文献

相关主题

期刊订阅