首页> 外文期刊>Future generation computer systems >Extending τ-Lop to model concurrent MPI communications in multicore clusters
【24h】

Extending τ-Lop to model concurrent MPI communications in multicore clusters

机译:扩展τ-Lop以对多核集群中的并行MPI通信建模

获取原文
获取原文并翻译 | 示例
       

摘要

Achieving optimal performance of MPI applications on current multi-core architectures, composed of multiple shared communication channels and deep memory hierarchies, is not trivial. Formal analysis using parallel performance models allows one to depict the underlying behavior of the algorithms and their communication complexities, with the aims of estimating their cost and improving their performance. LogGP model was initially conceived to predict the cost of algorithms in mono-processor clusters based on point-to-point transmissions with network latency and bandwidth based parameters. It remains as the representative model, with multiple extensions for handling high performance networks, covering particular contention cases, channels hierarchies or protocol costs. These very specific branches lead LogGP to partially lose its initial abstract modeling purpose. More recent log_nP represents a point-to-point transmission as a sequence of implicit transfers or data movements. Nevertheless, similar to LogGP, it models an algorithm in a parallel architecture as a sequence of message transmissions, an approach inefficient to model algorithms more advanced than simple tree-based one, as we will show in this work. In this paper, τ-Lop model is extended to multi-core clusters and compared to previous models. It demonstrates the ability to predict the cost of advanced algorithms and mechanisms used by mainstream MPI implementations, such as MPICH or Open MPI, with high accuracy. τ-Lop is based on the concept of concurrent transfers, and applies it to meaningfully represent the behavior of parallel algorithms in complex platforms with hierarchical shared communication channels, taking into account the effects of contention and deployment of processes on the processors. In addition, an exhaustive and reproducible methodology for measuring the parameters of the model is described.
机译:在由多个共享通信通道和深层存储层次结构组成的当前多核体系结构上实现MPI应用程序的最佳性能并非易事。使用并行性能模型的形式分析允许描述算法的基本行为及其通信复杂性,目的是估计其成本并提高其性能。最初设想使用LogGP模型来预测单处理器集群中基于网络延迟和基于带宽的参数的点对点传输的算法成本。它仍然是代表模型,具有用于处理高性能网络的多个扩展,涵盖特定的争用情况,通道层次结构或协议成本。这些非常具体的分支导致LogGP部分失去了其最初的抽象建模目的。最新的log_nP将点对点传输表示为一系列隐式传输或数据移动。但是,类似于LogGP,它在并行体系结构中将算法建模为一系列消息传输,这种方法无法高效地建模比基于树的简单算法更高级的算法,正如我们将在本文中展示的那样。本文将τ-Lop模型扩展到多核群集,并与以前的模型进行比较。它展示了以高精度预测主流MPI实现(例如MPICH或Open MPI)所使用的高级算法和机制的成本的能力。 τ-Lop基于并发传输的概念,并考虑到争用和处理器上的进程部署的影响,将其应用于有意义的表示具有分层共享通信通道的复杂平台中并行算法的行为。此外,还描述了一种用于测量模型参数的详尽且可复制的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号