首页> 外文会议>IEEE International Symposium on Parallel Distributed Processing >Adapting communication-avoiding LU and QR factorizations to multicore architectures
【24h】

Adapting communication-avoiding LU and QR factorizations to multicore architectures

机译:调整通信 - 避免对多核架构的LU和QR因子化

获取原文

摘要

In this paper we study algorithms for performing the LU and QR factorizations of dense matrices. Recently, two communication optimal algorithms have been introduced for distributed memory architectures, referred to as communication avoiding CALU and CAQR. In this paper we discuss two algorithms based on CAQR and CALU that are adapted to multicore architectures. They combine ideas to reduce communication from communication avoiding algorithms with asynchronism and dynamic task scheduling. For matrices that are tall and skinny, that is, they have many more rows than columns, the two algorithms outperform the corresponding algorithms from Intel MKL vendor library on a dual-socket, quad-core machine based on Intel Xeon EMT64 processor and on a four-socket, quad-core machine based on AMD Opteron processor. For these matrices, multithreaded CALU outperforms the corresponding routine dgetrf from Intel MKL library up to a factor of 2.3 and the corresponding routine dgetrf from ACML library up to a factor of 5, while multithreaded CAQR outperforms by a factor of 5.3 the corresponding dgeqrf routine from MKL library.
机译:在本文中,我们研究用于执行致密矩阵的LU和QR案件的算法。最近,已经引入了两个通信最佳算法用于分布式内存架构,称为避免Calu和CAQR的通信。在本文中,我们讨论了基于CAQR和CALU的两种算法,该算法适用于多核架构。它们结合了思想来减少与异步和动态任务调度的通信避免算法的通信。对于高且瘦的矩阵,即它们的行多于列,两种算法优于基于英特尔Xeon EMT64处理器的双套接字,四核机器上的英特尔MKL供应商库的相应算法。基于AMD Opteron处理器的四套筒,四核机。对于这些矩阵,多线程,Calu从Intel MKL库中的相应例程DGERF从Intel MKL库中达到2.3因子,并且来自ACML库的相应例程DGERF高达5倍,而多线程CAQR优于5.3的相应DGEQRF常规MKL图书馆。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号