首页> 外文学位 >High performance and scalable MPI intra-node communication middleware for multi-core clusters.
【24h】

High performance and scalable MPI intra-node communication middleware for multi-core clusters.

机译:用于多核集群的高性能和可扩展的MPI节点内通信中间件。

获取原文
获取原文并翻译 | 示例

摘要

Cluster of workstations is one of the most popular architectures in high performance computing, thanks to its cost-to-performance effectiveness. As multi-core technologies are becoming mainstream, more and more clusters are deploying multicore processors as the build unit. In the latest Top500 supercomputer list published in November 2008, about 85% of the sites use multi-core processors from Intel and AMD. Message Passing Interface (MPI) is one of the most popular programming models for cluster computing. With increased deployment of multi-core systems in clusters, it is expected that considerable communication will take place within a node. This suggests that MPI intra-node communication is going to play a key role in the overall application performance.;This dissertation presents novel MPI intra-node communication designs, including user level shared memory based approach, kernel assisted direct copy approach, and efficient multi-core aware hybrid approach. The user level shared memory based approach is portable across operating systems and platforms. The processes copy messages into and from a shared memory area for communication. The shared buffers are organized in a way such that it is efficient in cache utilization and memory usage. The kernel assisted direct copy approach takes help from the operating system kernel and directly copies message from one process to another so that it only needs one copy and improves performance from the shared memory based approach. In this approach, the memory copy can be either CPU based or DMA based. This dissertation explores both directions and for DMA based memory copy, we take advantage of novel mechanism such as I/OAT to achieve better performance and computation and communication overlap. To optimize performance on multicore systems, we efficiently combine the shared memory approach and the kernel assisted direct copy approach and propose a topology-aware and skew-aware hybrid approach. The dissertation also presents comprehensive performance evaluation and analysis of the approaches on contemporary multi-core systems such as Intel Clovertown cluster and AMD Barcelona cluster, both of which are quad-core processors based systems.;Software developed as a part of this dissertation is available in MVAPICH and MVAPICH2, which are popular open-source implementations of MPI-1 and MPI-2 libraries over InfiniBand and other RDMA-enabled networks and are used by several hundred top computing sites all around the world.
机译:工作站集群是高性能计算中最受欢迎的体系结构之一,这要归功于它的性价比。随着多核技术成为主流,越来越多的集群正在部署多核处理器作为构建单元。在2008年11月发布的最新Top500超级计算机列表中,大约85%的站点使用Intel和AMD的多核处理器。消息传递接口(MPI)是用于群集计算的最受欢迎的编程模型之一。随着群集中多核系统部署的增加,预计在节点内将进行大量通信。这表明MPI节点间通信将在整个应用程序性能中发挥关键作用。论文提出了新颖的MPI节点间通信设计,包括基于用户级共享内存的方法,内核辅助直接复制方法以及高效的多点通信。核心感知的混合方法。基于用户级共享内存的方法可跨操作系统和平台移植。进程将消息复制到共享内存区域或从共享内存区域复制消息以进行通信。共享缓冲区的组织方式使其在缓存利用率和内存利用率方面均有效。内核辅助的直接复制方法从操作系统内核获得帮助,并将消息从一个进程直接复制到另一个进程,因此它仅需要一个副本,并通过基于共享内存的方法提高了性能。在这种方法中,内存副本可以基于CPU或基于DMA。本文探讨了基于DMA的存储器复制的两个方向,我们利用I / OAT等新颖的机制来实现更好的性能以及计算和通信重叠。为了优化多核系统的性能,我们将共享内存方法和内核辅助直接复制方法有效地结合在一起,并提出了一种拓扑感知和偏斜感知的混合方法。论文还对现代多核系统(如Intel Clovertown群集和AMD Barcelona群集)的方法进行了全面的性能评估和分析,它们都是基于四核处理器的系统。在MVAPICH和MVAPICH2中,它们是在InfiniBand和其他启用RDMA的网络上流行的MPI-1和MPI-2库的开源实现,并且在全世界数百个顶级计算站点中使用。

著录项

  • 作者

    Chai, Lei.;

  • 作者单位

    The Ohio State University.;

  • 授予单位 The Ohio State University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 155 p.
  • 总页数 155
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号