首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures
【24h】

Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures

机译:多核架构上并行的两面矩阵归约为带对角线形式

获取原文
获取原文并翻译 | 示例

摘要

The objective of this paper is to extend, in the context of multicore architectures, the concepts of tile algorithms [Buttari et al., 2007] for Cholesky, LU, and QR factorizations to the family of two-sided factorizations. In particular, the bidiagonal reduction of a general, dense matrix is very often used as a preprocessing step for calculating the Singular Value Decomposition. Furthermore, in the Top500 list of June 2008, 98 percent of the fastest parallel systems in the world were based on multicores. This confronts the scientific software community with both a daunting challenge and a unique opportunity. The challenge arises from the disturbing mismatch between the design of systems based on this new chip architecture-hundreds of thousands of nodes, a million or more cores, reduced bandwidth and memory available to cores-and the components of the traditional software stack, such as numerical libraries, on which scientific applications have relied for their accuracy and performance. The many-core trend has even more exacerbated the problem, and it becomes critical to efficiently integrate existing or new numerical linear algebra algorithms suitable for such hardware. By exploiting the concept of tile algorithms in the multicore environment (i.e., high level of parallelism with fine granularity and high-performance data representation combined with a dynamic data-driven execution), the band bidiagonal reduction presented here achieves 94 Gflop/s on a 12,000 × 12,000 matrix with 16 Intel Tigerton 2.4 GHz processors. The main drawback of the tile algorithms approach for the bidiagonal reduction is that the full reduction cannot be obtained in one stage. Other methods have to be considered to further reduce the band matrix to the required form.
机译:本文的目的是在多核架构的背景下,将用于Cholesky,LU和QR分解的切片算法[Buttari等人,2007]的概念扩展到双面分解的族。特别地,通常将密集矩阵的双角形约简用作计算奇异值分解的预处理步骤。此外,在2008年6月的Top500榜单中,全球98%最快的并行系统是基于多核的。这使科学软件社区面临着艰巨的挑战和独特的机遇。挑战来自于基于这种新芯片架构的系统设计之间的令人不安的不匹配问题-数十万个节点,一百万或更多的内核,减少的带宽和内核可用的内存-以及传统软件堆栈的组件,例如数值库,其准确性和性能取决于科学应用。多核趋势甚至使问题更加严重,有效集成适用于此类硬件的现有或新的数值线性代数算法变得至关重要。通过在多核环境中利用图块算法的概念(即具有精细度的高并行度和高性能数据表示,以及动态数据驱动执行的结合),此处提出的能带双角减小可在单通道上达到94 Gflop / s。 12,000×12,000矩阵,带有16个Intel Tigerton 2.4 GHz处理器。用于对角线减小的瓦片算法方法的主要缺点是无法在一个阶段中获得完全减小。必须考虑采用其他方法将频带矩阵进一步缩小为所需形式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号