首页> 外文会议>IEEE International Symposium on Parallel and Distributed Processing >Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA
【24h】

Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA

机译:致密线性代数算法的大型平行架构与二玻克星的灵活发展

获取原文

摘要

We present a method for developing dense linear algebra algorithms that seamlessly scales to thousands of cores. It can be done with our project called DPLASMA (Distributed PLASMA) that uses a novel generic distributed Direct Acyclic Graph Engine (DAGuE). The engine has been designed for high performance computing and thus it enables scaling of tile algorithms, originating in PLASMA, on large distributed memory systems. The underlying DAGuE framework has many appealing features when considering distributed-memory platforms with heterogeneous multicore nodes: DAG representation that is independent of the problem-size, automatic extraction of the communication from the dependencies, overlapping of communication and computation, task prioritization, and architecture-aware scheduling and management of tasks. The originality of this engine lies in its capacity to translate a sequential code with nested-loops into a concise and synthetic format which can then be interpreted and executed in a distributed environment. We present three common dense linear algebra algorithms from PLASMA (Parallel Linear Algebra for Scalable Multi-core Architectures), namely: Cholesky, LU, and QR factorizations, to investigate their data driven expression and execution in a distributed system. We demonstrate through experimental results on the Cray XT5 Kraken system that our DAG-based approach has the potential to achieve sizable fraction of peak performance which is characteristic of the state-of-the-art distributed numerical software on current and emerging architectures.
机译:我们介绍了一种开发密集的线性代数算法的方法,可无缝缩放到数千核。它可以通过我们的项目来完成,该项目是使用新型通用分布式直接非循环图引擎(DAGUE)的DPlasma(分布式等离子体)进行的。该发动机专为高性能计算而设计,因此它能够在大分布式存储器系统上展示源自等离子体的图块算法。在考虑具有异构多核节点的分布式存储器平台时,底层的DAGUE框架具有许多吸引力的功能:DAG表示与问题大小无关,从依赖项自动提取通信,通信和计算重叠,任务优先级和架构-AWARE调度和管理任务。此引擎的原创性在于将顺序代码与嵌套循环转换为简洁和合成格式,然后可以在分布式环境中解释和执行。我们从等离子体(平行线性代数用于可扩展的多核架构)的三种常见的致密线性代数算法,即:Cholesky,Lu和QR acciplations,调查它们在分布式系统中的数据驱动表达和执行。我们通过实验结果证明了Cray XT5克拉肯系统,即我们的DAG的方法具有达到相当大的峰值性能的潜力,这是当前和新兴架构上最先进的分布式数值软件的特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号