首页> 外文学位 >Bamboo: Automatic Translation of MPI Source into a Latency-Tolerant Form.
【24h】

Bamboo: Automatic Translation of MPI Source into a Latency-Tolerant Form.

机译:Bamboo:将MPI源自动转换为延迟容忍的形式。

获取原文
获取原文并翻译 | 示例

摘要

Communication remains a significant barrier to scalability on distributed-memory systems. At present, the trend in architectural system design, which focuses on enhancing node performance, exacerbates the communication problem, since the relative cost of communication grows as the computation rate increases. This problem will be more pronounced at the exascale, where computational rates will be orders of magnitude faster than that of the current technology. Communication overlap is an efficient method to hide communication by masking it behind computation. However, existing overlapping techniques not only require significant programming effort but also complicate the original program.;This dissertation presents a source-to-source translation framework that can realize communication overlap in applications written in MPI, a standard library for distributed-memory programming, without the need to intrusively modify the source code. We explore a strategy based on re-interpreting MPI, which executes the application under a data-driven model that can hide communication overheads automatically. We reformulate MPI source into a task dependency graph representation, in which vertices represent tasks containing computation code and edges represent data dependencies among tasks. The task dependency graph maintains a partial ordering over the execution of tasks, enabling the program to execute in a data-driven fashion under the guidance of an external runtime system. To automate the code translation process, we develop Bamboo, a source-to-source translator. Bamboo supports a rich set of MPI routines, including point-to-point, collective, and communicator splitting operations.;We show that, for a variety of applications, Bamboo is able to hide communication overheads on a wide range of platforms including traditional clusters of multicore processors, as well as platforms based on accelerators (NVIDIA GPUs) and coprocessors (Intel MIC). Specifically, we translate applications taken from three different application motifs: dense linear algebra, structured and unstructured grids. In all cases, Bamboo significantly reduces communication delays while requiring only modest amounts of programmer annotation. The performance of applications translated with Bamboo meets or exceeds that of labor-intensive hand coding. The translator is more than a means of hiding communication costs automatically; it also serves as an example of the utility of semantic level optimization against a well-known library.
机译:通信仍然是分布式内存系统可伸缩性的重要障碍。当前,集中在提高节点性能上的体系结构设计趋势加剧了通信问题,因为通信的相对成本随着计算速率的增加而增长。这个问题将在百亿亿美元级别上更加突出,因为百亿亿级别的计算速率将比当前技术快几个数量级。通信重叠是一种有效的方法,可通过在计算后对其进行掩盖来隐藏通信。然而,现有的重叠技术不仅需要大量的编程工作,而且还会使原始程序复杂化。本论文提出了一种源到源的翻译框架,该框架可以实现以MPI编写的应用程序中的通信重叠,MPI是分布式内存编程的标准库,无需侵入性地修改源代码。我们探索了基于重新解释MPI的策略,该策略在可以自动隐藏通信开销的数据驱动模型下执行应用程序。我们将MPI源重构为任务依赖关系图表示形式,其中顶点表示包含计算代码的任务,边表示任务之间的数据依赖关系。任务依赖图在任务执行过程中保持部分排序​​,使程序能够在外部运行时系统的指导下以数据驱动的方式执行。为了使代码翻译过程自动化,我们开发了Bamboo,这是一种源到源的翻译器。 Bamboo支持丰富的MPI例程集,包括点对点,集合和通信器拆分操作。;我们证明,对于各种应用程序,Bamboo能够隐藏包括传统集群在内的各种平台上的通信开销多核处理器以及基于加速器(NVIDIA GPU)和协处理器(Intel MIC)的平台。具体来说,我们翻译来自三个不同应用程序主题的应用程序:密集线性代数,结构化和非结构化网格。在所有情况下,Bamboo都大大减少了通信延迟,同时仅需要少量的程序员注释。用Bamboo进行翻译的应用程序的性能达到或超过了劳动密集型手工编码的性能。翻译者不仅仅是自动隐藏通讯费用的一种手段;它也用作针对知名库的语义级别优化实用程序的示例。

著录项

  • 作者

    Nguyen Thanh, Nhat Tan.;

  • 作者单位

    University of California, San Diego.;

  • 授予单位 University of California, San Diego.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 201 p.
  • 总页数 201
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号