首页> 外文会议>2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) >Adapting communication-avoiding LU and QR factorizations to multicore architectures
【24h】

Adapting communication-avoiding LU and QR factorizations to multicore architectures

机译:使避免通信的LU和QR因式分解适应多核体系结构

获取原文
获取原文并翻译 | 示例

摘要

In this paper we study algorithms for performing the LU and QR factorizations of dense matrices. Recently, two communication optimal algorithms have been introduced for distributed memory architectures, referred to as communication avoiding CALU and CAQR. In this paper we discuss two algorithms based on CAQR and CALU that are adapted to multicore architectures. They combine ideas to reduce communication from communication avoiding algorithms with asynchronism and dynamic task scheduling. For matrices that are tall and skinny, that is, they have many more rows than columns, the two algorithms outperform the corresponding algorithms from Intel MKL vendor library on a dual-socket, quad-core machine based on Intel Xeon EMT64 processor and on a four-socket, quad-core machine based on AMD Opteron processor. For these matrices, multithreaded CALU outperforms the corresponding routine dgetrf from Intel MKL library up to a factor of 2.3 and the corresponding routine dgetrf from ACML library up to a factor of 5, while multithreaded CAQR outperforms by a factor of 5.3 the corresponding dgeqrf routine from MKL library.
机译:在本文中,我们研究了执行密集矩阵的LU和QR分解的算法。最近,针对分布式存储体系结构引入了两种通信最佳算法,称为避免CALU和CAQR的通信。在本文中,我们讨论了两种适用于多核体系结构的基于CAQR和CALU的算法。他们结合了各种想法,以减少通信避免算法与异步和动态任务调度之间的通信。对于又高又瘦的矩阵,也就是说,它们的行多于列,这两种算法在基于Intel Xeon EMT64处理器的双插槽,四核计算机上以及在基于AMD Opteron处理器的四路四核计算机。对于这些矩阵,多线程CALU的性能优于Intel MKL库的相应例程dgetrf的2.3倍,而ACML库中的相应例程dgetrf的性能最高达5的系数,而多线程CAQR的性能优于5.3中的相应的dgeqrf例程MKL库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号