首页> 外文学位 >Automatically tuning collective communication for one-sided programming models.
【24h】

Automatically tuning collective communication for one-sided programming models.

机译:自动调整单边编程模型的集体通信。

获取原文
获取原文并翻译 | 示例

摘要

Technology trends suggest that future machines will rely on parallelism to meet increasing performance requirements. To aid in programmer productivity and application performance, many parallel programming models provide communication building blocks called collective communication. These operations, such as Broadcast, Scatter, Gather, and Reduce, abstract common global data movement patterns behind a simple library interface allowing the hardware and runtime system to optimize them for performance and scalability.;We consider the problem of optimizing collective communication in Partitioned Global Address Space (PGAS) languages. Rooted in traditional shared memory programming models, they deliver the benefits of sophisticated distributed data structures using language extensions and one-sided communication. One-sided communication allows one processor to directly read and write memory associated with another. Many popular PGAS language implementations share a common runtime system called GASNet for implementing such communication. To provide a highly scalable platform for our work, we present a new implementation of GASNet for the IBM BlueGene/P, allowing GASNet to scale to tens of thousands of processors.;We demonstrate that PGAS languages are highly scalable and that the one-sided communication within them is an efficient and convenient platform for collective communication. We show how to use one-sided communication to achieve 3x improvements in the latency and throughput of the collectives over standard message passing implementations. Using a 3D FFT as a representative communication bound benchmark, for example, we see a 17% increase in performance on 32,768 cores of the BlueGene/P and a 1.5x improvement on 1024 cores of the CrayXT4. We also show how the automatically tuned collectives can deliver more than an order of magnitude in performance over existing implementations on shared memory platforms.;There is no obvious best algorithm that serves all machines and usage patterns demonstrating the need for tuning and we thus build an automatic tuning system in GASNet that optimizes the collectives for a variety of large scale supercomputers and novel multicore architectures. To understand the large search space, we construct analytic performance models use them to minimize the overhead of autotuning. We demonstrate that autotuning is an effective approach to addressing performance optimizations on complex parallel systems.
机译:技术趋势表明,未来的机器将依靠并行性来满足不断增长的性能要求。为了提高程序员的生产力和应用程序性能,许多并行编程模型提供了称为集体通信的通信构建块。这些操作(例如广播,分散,聚集和缩减)在一个简单的库接口后面抽象了通用的全局数据移动模式,从而使硬件和运行时系统可以针对性能和可伸缩性对其进行优化;;我们考虑了在分区中优化集体通信的问题。全局地址空间(PGAS)语言。它们植根于传统的共享内存编程模型中,利用语言扩展和单面通信提供了复杂的分布式数据结构的好处。单面通信允许一个处理器直接读取和写入与另一个处理器关联的内存。许多流行的PGAS语言实现共享一个称为GASNet的通用运行时系统,以实现这种通信。为了为我们的工作提供高度可扩展的平台,我们为IBM BlueGene / P提供了GASNet的新实现,使GASNet可以扩展到成千上万的处理器。我们证明PGAS语言具有高度的可扩展性,并且单侧它们之间的通信是集体通信的高效便捷的平台。我们展示了如何使用单面通信,使集合的延迟和吞吐量比标准消息传递实现高3倍。例如,使用3D FFT作为代表性的通信限制基准,我们看到BlueGene / P的32,768个内核的性能提高了17%,而CrayXT4的1024个内核的性能提高了1.5倍。我们还展示了自动调整的集合如何在共享内存平台上的现有实现方案上提供超过一个数量级的性能;没有明显的最佳算法可以为所有机器和使用模式服务,这表明需要调整,因此我们构建了一个GASNet中的自动调整系统可优化各种大型超级计算机和新颖的多核体系结构的集合。为了理解较大的搜索空间,我们构建了解析性能模型,使用它们可以最大程度地减少自动调整的开销。我们证明了自动调整是解决复杂并行系统性能优化的有效方法。

著录项

  • 作者

    Nishtala, Rajesh.;

  • 作者单位

    University of California, Berkeley.;

  • 授予单位 University of California, Berkeley.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 176 p.
  • 总页数 176
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号