首页> 外文会议>Workshop on the LLVM Compiler Infrastructure in HPC;International Conference for High Performance Computing, Networking, Storage and Analysis;Workshop on Hierarchical Parallelism for Exascale Computing >Introducing multi-level parallelism, at coarse, fine and instruction level to enhance the performance of iterative solvers for large sparse linear systems on Multi- and Many-core architecture
【24h】

Introducing multi-level parallelism, at coarse, fine and instruction level to enhance the performance of iterative solvers for large sparse linear systems on Multi- and Many-core architecture

机译:引入多级并行性,处于粗,精细和教学水平,以增强迭代求解器对多核和多核架构的大型稀疏线性系统的性能

获取原文

摘要

With the evolution of High Performance Computing, multi-core and many-core systems are now a common feature of new hardware architectures. The introduction of very large number of cores at the processor level is challenging because it requires to handle multi level parallelism at various levels either coarse or fine to fully take advantage of the offered computing power. The induced programming effort can be fixed with parallel programming models based on the data flow model and the task programming paradigm [1]. To do so many of the standard numerical algorithms must be revisited as they cannot be easily parallelized at the finest levels. Iterative linear solvers are a key part of petroleum reservoir simulation as they can represent up to 80% of the total computing time. In these algorithms, the standard preconditioning methods for large, sparse and unstructured matrices - such as Incomplete LU Factorization (ILU) or Algebraic Multigrid (AMG) - fail to scale on shared-memory architectures with large number of cores. In this paper we reconsider preconditioning algorithms to better introduce multi-level parallelism at both coarse level with MPI, fine level with threads and at the instruction level to enable SIMD optimizations. This paper illustrates how we enhance the implementation of preconditioners like the multilevel domain decomposition (DDML) preconditioners [2], based on the popular Additive Schwartz Method (ASM), or the classical ILU0 preconditioner with the fine grained parallel fixed point variant presented in [3]. Our approach is validated on linear systems extracted from realistic petroleum reservoir simulations. The robustness of the preconditioners is tested with respect to the data heterogeneities of the study cases. We evaluate the extensibility of our implementation regarding the model sizes and its scalability regarding the large number of cores provided by new KNL processors or multi-nodes clusters.
机译:随着高性能计算的演变,多核和许多核心系统现在是新硬件架构的共同特征。在处理器级别引入非常大量的核心是具有挑战性的,因为它需要在粗糙或精细地处理各个级别的多级并行性,以充分利用所提供的计算能力。诱导的编程工作可以通过基于数据流模型和任务编程范例[1]使用并行编程模型来固定。为此,必须重新访问许多标准数值算法,因为它们不能在最好的水平下轻松地平行化。迭代线性溶剂是石油储层模拟的关键部分,因为它们可以代表总计算时间的80%。在这些算法中,用于大,稀疏和非结构化矩阵的标准预处理方法 - 例如不完整的LU分解(ILU)或代数Multigrid(AMG) - 没有缩放具有大量核心的共享内存架构。在本文中,我们重新考虑了预处理算法,以更好地使用MPI,细级别的粗级引入多级并行度,带有线程和指令级别,以实现SIMD优化。本文说明了如何基于流行的添加剂Schwartz方法(ASM)或具有[的经典ILU0预处理器,所述多级域分解(DDML)预处理器[2]等预处理器的实现方式3]。我们的方法在从逼真的石油储层模拟中提取的线性系统上验证。关于研究病例的数据异质性测试了预处理器的鲁棒性。我们评估了我们对模型大小的可扩展性及其关于新KNL处理器或多节点集群提供的大量核心的可扩展性。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号