首页> 外文期刊>Parallel Computing >Basker: Parallel sparse LU factorization utilizing hierarchical parallelism and data layouts
【24h】

Basker: Parallel sparse LU factorization utilizing hierarchical parallelism and data layouts

机译:Basker:利用分层并行性和数据布局进行并行稀疏LU分解

获取原文
获取原文并翻译 | 示例

摘要

Transient simulation in circuit simulation tools, such as SPICE and Xyce, depend on scalable and robust sparse LU factorizations for efficient numerical simulation of circuits and power grids. As the need for simulations of very large circuits grow, the prevalence of multicore architectures enable us to use shared memory parallel algorithms for such simulations. A parallel factorization is a critical component of such shared memory parallel simulations. We develop a parallel sparse factorization algorithm that can solve problems from circuit simulations efficiently, and map well to architectural features. This new factorization algorithm exposes hierarchical parallelism to accommodate irregular structure that arise in our target problems. It also uses a hierarchical two-dimensional data layout which reduces synchronization costs and maps to memory hierarchy found in multi core processors. We present an OpenMP based implementation of the parallel algorithm in a new multithreaded solver called Basker in the Trilinos framework. We present performance evaluations of Basker on the Intel SandyBridge and Xeon Phi platforms using circuit and power grid matrices taken from the University of Florida sparse matrix collection and from Xyce circuit simulation. Basker achieves a geometric mean speedup of 5.91x on CPU (16 cores) and 7.4x on Xeon Phi (32 cores) relative to state-of-the-art solver KLU. Basker outperforms Intel MKL Pardiso solver (PMKL) by as much as 30x on CPU (16 cores) and 7.5x on Xeon Phi (32 cores) for low fill-in circuit matrices. Furthermore, Basker provides 5.4x speedup on a challenging matrix sequence taken from an actual Xyce simulation. (C) 2017 Published by Elsevier B.V.
机译:诸如SPICE和Xyce之类的电路仿真工具中的瞬态仿真取决于可扩展且健壮的稀疏LU分解,以进行电路和电网的高效数值仿真。随着对大型电路仿真需求的增长,多核体系结构的普及使我们能够使用共享内存并行算法进行此类仿真。并行分解是此类共享内存并行仿真的关键组成部分。我们开发了一种并行稀疏分解算法,该算法可以有效地解决电路仿真中的问题,并很好地映射到架构特征。这种新的分解算法公开了分层并行性,以适应目标问题中出现的不规则结构。它还使用了分层的二维数据布局,从而降低了同步成本并映射到多核处理器中的内存分层结构。我们在Trilinos框架中称为Basker的新多线程求解器中提出了基于并行算法的OpenMP实现。我们使用从佛罗里达大学稀疏矩阵集合和Xyce电路仿真获得的电路和电源矩阵,介绍了Intel SandyBridge和Xeon Phi平台上Basker的性能评估。相较于最新的求解器KLU,Basker在CPU(16核)上的几何平均速度提高了5.91倍,在Xeon Phi(32核)上实现了7.4倍的几何平均速度。在低填充电路矩阵方面,Basker在CPU(16核)上比Intel MKL Pardiso求解器(PMKL)高出30倍,在Xeon Phi(32核)上胜过7.5倍。此外,Basker在从实际Xyce仿真中提取的具有挑战性的矩阵序列上提供了5.4倍的加速。 (C)2017由Elsevier B.V.发布

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号