首页> 外文学位 >Automatically tuning collective communication for one-sided programming models.

【24h】

Automatically tuning collective communication for one-sided programming models.

机译：自动调整单边编程模型的集体通信。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Technology trends suggest that future machines will rely on parallelism to meet increasing performance requirements. To aid in programmer productivity and application performance, many parallel programming models provide communication building blocks called collective communication. These operations, such as Broadcast, Scatter, Gather, and Reduce, abstract common global data movement patterns behind a simple library interface allowing the hardware and runtime system to optimize them for performance and scalability.;We consider the problem of optimizing collective communication in Partitioned Global Address Space (PGAS) languages. Rooted in traditional shared memory programming models, they deliver the benefits of sophisticated distributed data structures using language extensions and one-sided communication. One-sided communication allows one processor to directly read and write memory associated with another. Many popular PGAS language implementations share a common runtime system called GASNet for implementing such communication. To provide a highly scalable platform for our work, we present a new implementation of GASNet for the IBM BlueGene/P, allowing GASNet to scale to tens of thousands of processors.;We demonstrate that PGAS languages are highly scalable and that the one-sided communication within them is an efficient and convenient platform for collective communication. We show how to use one-sided communication to achieve 3x improvements in the latency and throughput of the collectives over standard message passing implementations. Using a 3D FFT as a representative communication bound benchmark, for example, we see a 17% increase in performance on 32,768 cores of the BlueGene/P and a 1.5x improvement on 1024 cores of the CrayXT4. We also show how the automatically tuned collectives can deliver more than an order of magnitude in performance over existing implementations on shared memory platforms.;There is no obvious best algorithm that serves all machines and usage patterns demonstrating the need for tuning and we thus build an automatic tuning system in GASNet that optimizes the collectives for a variety of large scale supercomputers and novel multicore architectures. To understand the large search space, we construct analytic performance models use them to minimize the overhead of autotuning. We demonstrate that autotuning is an effective approach to addressing performance optimizations on complex parallel systems.

机译：技术趋势表明，未来的机器将依靠并行性来满足不断增长的性能要求。为了提高程序员的生产力和应用程序性能，许多并行编程模型提供了称为集体通信的通信构建块。这些操作（例如广播，分散，聚集和缩减）在一个简单的库接口后面抽象了通用的全局数据移动模式，从而使硬件和运行时系统可以针对性能和可伸缩性对其进行优化;;我们考虑了在分区中优化集体通信的问题。全局地址空间（PGAS）语言。它们植根于传统的共享内存编程模型中，利用语言扩展和单面通信提供了复杂的分布式数据结构的好处。单面通信允许一个处理器直接读取和写入与另一个处理器关联的内存。许多流行的PGAS语言实现共享一个称为GASNet的通用运行时系统，以实现这种通信。为了为我们的工作提供高度可扩展的平台，我们为IBM BlueGene / P提供了GASNet的新实现，使GASNet可以扩展到成千上万的处理器。我们证明PGAS语言具有高度的可扩展性，并且单侧它们之间的通信是集体通信的高效便捷的平台。我们展示了如何使用单面通信，使集合的延迟和吞吐量比标准消息传递实现高3倍。例如，使用3D FFT作为代表性的通信限制基准，我们看到BlueGene / P的32,768个内核的性能提高了17％，而CrayXT4的1024个内核的性能提高了1.5倍。我们还展示了自动调整的集合如何在共享内存平台上的现有实现方案上提供超过一个数量级的性能;没有明显的最佳算法可以为所有机器和使用模式服务，这表明需要调整，因此我们构建了一个GASNet中的自动调整系统可优化各种大型超级计算机和新颖的多核体系结构的集合。为了理解较大的搜索空间，我们构建了解析性能模型，使用它们可以最大程度地减少自动调整的开销。我们证明了自动调整是解决复杂并行系统性能优化的有效方法。

著录项

作者
Nishtala, Rajesh.;
展开▼
作者单位

University of California, Berkeley.;

展开▼
授予单位 University of California, Berkeley.;
学科 Computer Science.
学位 Ph.D.
年度 2009
页码 176 p.
总页数 176
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Tuning collective communication for Partitioned Global Address Space programming models [J] . Rajesh Nishtala, Yili Zheng, Paul H. Hargrove, Parallel Computing . 2011,第9期

机译：调整分区全局地址空间编程模型的集体通信
2. Maximizing Communication-Computation Overlap Through Automatic Parallelization and Run-time Tuning of Non-blocking Collective Operations [J] . Barigou Youcef, Gabriel Edgar International journal of parallel programming . 2017,第6期

机译：通过自动并行化和无阻塞集体操作的运行时调整来最大化通信计算重叠
3. Automatic impedance matching and antenna tuning using quantum genetic algorithms for wireless and mobile communications [J] . Yanghong Tan, Yichuang Sun, David Lauder Microwaves, Antennas & Propagation, IET . 2013,第8期

机译：使用量子遗传算法进行无线和移动通信的自动阻抗匹配和天线调谐
4. Automatically tuned collective communications [C] . Sathish S. Vadhiyar, Graham E. Fagg, Jack Dongarra, ACM/IEEE conference on Supercomputing . 2000

机译：自动调整的集体通讯
5. CMOS gigahertz-band high-Q filters with automatic tuning circuitry for communication applications. [D] . Chang, Yuyu. 2002

机译：具有用于通信应用的自动调谐电路的CMOS千兆赫兹高Q滤波器。
6. Indian Summer Monsoon Simulations: Usefulness of Increasing Horizontal Resolution Manual Tuning and Semi-Automatic Tuning in Reducing Present-Day Model Biases [O] . Abhishek Anand, Saroj K. Mishra, Sandeep Sahany, -1

机译：印度夏季风模拟：提高水平分辨率手动调整和半自动调整在减少当前模型偏差方面的作用
7. Automatically Tuned Collective Communications [O] . Sathish S. Vadhiyar, Graham E. Fagg, Jack Dongarra 2000

机译：自动调谐的集体通信
8. ATCOM: Automatically Tuned Collective Communication System for SMP Clusters Doctoral thesis [R] . Wu, M. S. 2005

机译：aTCOm：smp集群自动调整集体通信系统博士论文

Automatically tuning collective communication for one-sided programming models.

摘要

著录项

相似文献

相关主题

期刊订阅