首页> 外文OA文献 >What Multilevel Parallel Programs do when you are not Watching: A Performance Analysis Case Study Comparing MPI/OpenMP, MLP, and Nested OpenMP
【2h】

What Multilevel Parallel Programs do when you are not Watching: A Performance Analysis Case Study Comparing MPI/OpenMP, MLP, and Nested OpenMP

机译:不在观看时多级并行程序会做什么:比较MPI / OpenMP,MLP和嵌套OpenMP的性能分析案例研究

摘要

With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.
机译:随着当前并行计算机体系结构趋向于共享内存对称多处理器集群的趋势,并行编程技术得到了发展,该技术支持并行性已超出单个级别。在比较基于不同编程范例的应用程序的性能时,重要的是区分编程模型本身的影响和其他因素,例如操作系统(OS)的特定于实现的行为或体系结构问题。为了采用新的编程范例而进行的大型科学应用程序的重写通常是一项耗时且容易出错的任务。在着手这项工作之前,重要的是要确定确实有当前实施无法获得的收益。详细的性能分析对于阐明这些问题至关重要。本研究中考虑的多层编程范例是MPI / OpenMP,MLP和嵌套OpenMP混合。 MPI / OpenMP混合方法基于使用MPI [7]进行粗粒度并行化,使用OpenMP [9]进行细粒度循环级并行化。 MPI编程范例为每个进程假定一个专用地址空间。通过调用MPI库显式交换消息来传输数据。该模型最初是为分布式内存体系结构设计的,但也适用于共享内存系统。正在考虑的第二个范例是塔夫脱(Taft)开发的MLP。该方法类似于MPi / OpenMP,混合使用了粗粒度过程级并行化和循环级OpenMP并行化。与MPI一样,每个进程都假定有专用地址空间。 MLP方法是为ccNUMA体系结构开发的,并且明确利用了共享内存的可用性。需要所有进程都可以访问的共享内存空间。通过读取和写入共享内存来完成通信。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号