首页> 外文期刊>Journal of Parallel and Distributed Computing >Asynchronous and multithreaded communications on irregular applications using vectorized divide and conquer approach
【24h】

Asynchronous and multithreaded communications on irregular applications using vectorized divide and conquer approach

机译:使用矢量化分而治之方法在不规则应用程序上进行异步和多线程通信

获取原文
获取原文并翻译 | 示例

摘要

AbstractThe evolution of hardware architectures driven by the increasing requirement for performance and energy efficiency has led to complex HPC systems. In the context of Finite Element Methods, exposing massive parallelism on unstructured mesh computations with efficient load balancing and minimal synchronizations is challenging. Several parallelization strategies have to be combined together to exploit the multiple levels of parallelism. We propose several contributions aimed at addressing irregular codes and data structures in an efficient way. We have developed a hybrid parallelization approach based on the Divide & Conquer (D&C) principle which combines the distributed, shared, and vectorial forms of parallelism in a fine grain task-based parallelism approach applied to irregular structures. We experiment our approach using a matrix assembly step of an industrial application from Dassault Aviation on standard Xeon multicores and Xeon Phi KNC manycores. On 512 Intel Xeon E5-2670 Sandy Bridge cores, we surpass the pure MPI approach by up to3.47×and reach 77% of parallel efficiency using only 2000 vertices per core. On 4 Xeon Phi 5110p KNC, D&C has similar performance to 96 Intel Xeon E5-2670 Sandy Bridge cores; it achieves an excellent parallel efficiency of 96%, and up to6.56×speedup compared to pure MPI.HighlightsA Divide and Conquer parallelization algorithm on unstructured meshes.A coloring algorithm heuristic for vectorization, and a data parallelism model.Integration of PGAS multithreaded communications with fine grain tasks.A D&C library demonstrated in a real industrial application.
机译: 摘要 由于对性能和能效的要求不断提高而推动的硬件体系结构的演变导致了复杂的HPC系统。在有限元方法的背景下,在具有有效负载平衡和最小同步的非结构化网格计算中公开大规模并行性是具有挑战性的。必须将几种并行化策略组合在一起,以利用多个级别的并行性。我们提出了一些旨在有效解决不规则代码和数据结构的贡献。我们已经基于分而治之(D&C)原理开发了一种混合并行化方法,该方法在适用于不规则结构的基于细粒度任务的并行方法中结合了分布式,共享和矢量形式的并行性。我们使用达索航空在标准Xeon多核和Xeon Phi KNC多核上的工业应用的矩阵组装步骤对方法进行实验。在512个Intel Xeon E5-2670 Sandy Bridge内核上,我们超越了纯MPI方法,最多达到了 3 47 × ,每核仅使用2000个顶点即可达到77%的并行效率。在4个Xeon Phi 5110p KNC上,D&C的性能类似于96个Intel Xeon E5-2670 Sandy Bridge内核。它实现了96%的出色并行效率,并且达到了 6 56 × 加速。 突出显示 一种非结构化网格上的分治法并行化算法。 用于矢量化的启发式着色算法和数据并行模型。 集成PGAS多线程通信的情况非常好谷物任务。 一个在实际工业应用中展示的D&C库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号