首页> 外文会议>IEEE International Symposium on Parallel Distributed Processing >QR factorization of tall and skinny matrices in a grid computing environment
【24h】

QR factorization of tall and skinny matrices in a grid computing environment

机译:电网计算环境中高瘦矩阵的QR分解

获取原文

摘要

Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific applications, conventional supercomputers are still strongly predominant in high-performance computing and the use of grids for speeding up large-scale scientific problems is limited to applications exhibiting parallelism at a higher level. We have identified two performance bottlenecks in the distributed memory algorithms implemented in ScaLAPACK, a state-of-the-art dense linear algebra library. First, because ScaLA-PACK assumes a homogeneous communication network, the implementations of ScaLAPACK algorithms lack locality in their communication pattern. Second, the number of messages sent in the ScaLAPACK algorithms is significantly greater than other algorithms that trade flops for communication. In this paper, we present a new approach for computing a QR factorization - one of the main dense linear algebra kernels - of tall and skinny matrices in a grid computing environment that overcomes these two bottlenecks. Our contribution is to articulate a recently proposed algorithm (Communication Avoiding QR) with a topology-aware middleware (QCG-OMPI) in order to confine intensive communications (ScaLAPACK calls) within the different geographical sites. An experimental study conducted on the Grid'5000 platform shows that the resulting performance increases linearly with the number of geographical sites on large-scale problems (and is in particular consistently higher than ScaLAPACK's).
机译:以前的研究报告说,通过使用计算网格的多个地理位置,常见的密集线性代数操作不会达到速度。因为这种操作是大多数科学应用的构建块,所以传统的超级计算机在高性能计算中仍然强烈占主导地位,并且使用电网加速大规模的科学问题的使用仅限于在更高水平上表现出并行性的应用。我们已经确定了在ScalaCack中实现的分布式内存算法中的两个性能瓶颈,是最先进的密集线性代数库库。首先,因为Scala-Pack采用均匀通信网络,所以Scalapack算法的实现缺少其通信模式的局部性。其次,在Scalapack算法中发送的消息的数量明显大于交易跨越通信的其他算法。在本文中,我们提出了一种用于计算QR分解的新方法 - 在网格计算环境中克服这两个瓶颈的网格计算环境中的高和瘦矩阵的主要致密线性代数核之一。我们的贡献是阐明最近提出的算法(通信避免QR),其中拓扑知识的中间件(QCG-OMPI),以限制不同地理位置内的密集通信(ScalaCack呼叫)。在Grid'5000平台上进行的实验研究表明,由大规模问题的地理位置数量线性地增加了所得性能(并且特别始终高于ScalaCack)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号