首页> 外文学位 >Improving message-passing performance and scalability in high-performance clusters.

【24h】

Improving message-passing performance and scalability in high-performance clusters.

机译：在高性能群集中提高消息传递性能和可伸缩性。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

High Performance Computing (HPC) is the key to solving many scientific, financial, and engineering problems. Computer clusters are now the dominant architecture for HPC. The scale of clusters, both in terms of processor per node and the number of nodes, is increasing rapidly, reaching petascales these days and soon to exascales. Inter-process communication plays a significant role in the overall performance of HPC applications. With the continuous enhancements in interconnection technologies and node architectures, the Message Passing Interface (MPI) needs to be improved to effectively utilize the modern technologies for higher performance.;To improve communication progress and overlap in large message transfers, a method is proposed which uses speculative communication to overlap communication with computation in the MPI Rendezvous protocol. The results show up to 100% communication progress and more than 80% overlap ability over iWARP Ethernet. An adaptation mechanism is employed to avoid overhead on applications that do not benefit from the method due to their timing specifications.;To reduce MPI communication latency, I have proposed a technique that exploits the application buffer reuse characteristics for small messages and eliminates the sender-side copy in both two-sided and one-sided MPI small message transfer protocols. The implementation over InfiniBand improves small message latency up to 20%. The implementation adaptively falls back to the current method if the application does not benefit from the proposed technique.;Finally, to improve scalability of MPI applications on ultra-scale clusters, I have proposed an extension to the current iWARP standard. The extension improves performance and memory usage for large-scale clusters. The extension equips Ethernet with an efficient zero-copy, connection-less datagram transport. The software-level evaluation shows more than 40% performance benefits and 30% memory usage reduction for MPI applications on a 64-core cluster.;After providing a background, I present a deep analysis of the user level and MPI libraries over modern cluster interconnects: InfiniBand, iWARP Ethernet, and Myrinet. Using novel techniques, I assess characteristics such as overlap and communication progress ability, buffer reuse effect on latency, and multiple-connection scalability. The outcome highlights some of the inefficiencies that exist in the communication libraries.

机译：高性能计算（HPC）是解决许多科学，财务和工程问题的关键。现在，计算机群集是HPC的主要体系结构。集群的规模，无论是每个节点的处理器数量还是节点数量，都在迅速增长，如今已达到PB级，不久便达到了EB级。进程间通信在HPC应用程序的整体性能中起着重要作用。随着互连技术和节点体系结构的不断增强，需要改进消息传递接口（Message Passing Interface，MPI）以有效利用现代技术来实现更高的性能。为了提高通信进度和大消息传输中的重叠，提出了一种使用在MPI Rendezvous协议中，将推测性通信与计算中的通信重叠。结果表明，与iWARP以太网相比，通信进度高达100％，重叠能力超过80％。采用了一种自适应机制来避免由于其时序规范而无法从该方法中受益的应用程序的开销。为了减少MPI通信延迟，我提出了一种技术，该技术利用了小消息的应用程序缓冲区重用特性并消除了发送方-双面和双面MPI小消息传输协议中的侧面复制。通过InfiniBand实施可将小消息延迟提高多达20％。如果应用程序无法从所提出的技术中受益，则实现方式将自适应地退回到当前方法。最后，为了提高MPI应用程序在超大规模集群上的可伸缩性，我提出了对当前iWARP标准的扩展。该扩展提高了大型集群的性能和内存使用率。该扩展为以太网提供了高效的零拷贝，无连接数据报传输。软件级别的评估显示，对于64核群集上的MPI应用程序，性能提高了40％以上，内存使用减少了30％。；提供了背景知识之后，我对现代群集互连上的用户级别和MPI库进行了深入分析。：InfiniBand，iWARP以太网和Myrinet。我使用新颖的技术来评估诸如重叠和通信进度能力，缓冲区重用对延迟的影响以及多连接可伸缩性等特征。结果突出了通信库中存在的一些低效率问题。

著录项

作者
Rashti, Mohammad Javad.;
展开▼
作者单位

Queen's University (Canada).;

展开▼
授予单位 Queen's University (Canada).;
学科 Engineering Electronics and Electrical.
学位 Ph.D.
年度 2011
页码 184 p.
总页数 184
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. High-performance message-passing over generic Ethernet hardware with Open-MX [J] . Brice Goglin Parallel Computing . 2011,第2期

机译：带有Open-MX的通用以太网硬件上的高性能消息传递
2. BEAGLE 3: Improved Performance, Scaling, and Usability for a High-Performance Computing Library for Statistical Phylogenetics [J] . Ayres Daniel, Cummings Michael, Baele Guy, Systematic Biology . 2019,第6期

机译：比猎犬3：改进的性能，缩放和可用性为统计系统发育的高性能计算库
3. A 48-Core IA-32 Processor in 45 nm CMOS Using On-Die Message-Passing and DVFS for Performance and Power Scaling [J] . Howard J., Dighe S., Vangal S. R., Solid-State Circuits, IEEE Journal of . 2011,第1期

机译：采用On-Die消息传递和DVFS的45 nm CMOS 48核IA-32处理器，可实现性能和功耗扩展
4. Exploiting Content Similarity to Improve Memory Performance in Large-Scale High-Performance Computing Systems [C] . Levy Scott IEEE International Parallel and Distributed Processing Symposium Workshops and PhD Forum . 2013

机译：利用内容相似性来提高大型高性能计算系统中的内存性能
5. Scalable and high-performance MPI design for very large InfiniBand clusters. [D] . Sur, Sayantan. 2007

机译：适用于非常大的InfiniBand群集的可扩展的高性能MPI设计。
6. BEAGLE 3: Improved Performance Scaling and Usability for a High-Performance Computing Library for Statistical Phylogenetics [O] . Daniel L Ayres, Michael P Cummings, Guy Baele, -1

机译：第3点：改进的性能扩展性和可用性用于统计系统遗传学的高性能计算库
7. Improving Scalability and Maintenance of Software for High-Performance Scientific Computing by Combining MDE and Frameworks [O] . Palyart, Marc, Lugato, David, Ober, Ileana, 2011

机译：通过结合MDE和框架来提高高性能科学计算软件的可伸缩性和维护性

Improving message-passing performance and scalability in high-performance clusters.

摘要

著录项

相似文献

相关主题

期刊订阅