首页> 外文学位 >Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages
【24h】

Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages

机译:自动调整的领域特定嵌入式语言的高效生产并行编程

获取原文
获取原文并翻译 | 示例

摘要

As the complexity of machines and architectures has increased, performance tuning has become more challenging, leading to the failure of general compilers to generate the best possible optimized code. Expert performance programmers can often hand-write code that outperforms compiler-optimized low-level code by an order of magnitude. At the same time, the complexity of programs has also increased, with modern programs built on a variety of abstraction layers to manage complexity, yet these layers hinder efforts at optimization. In fact, it is common to lose one or two additional orders of magnitude in performance when going from a low-level language such as Fortran or C to a high-level language like Python, Ruby, or Matlab.;General purpose compilers are limited by the inability of program analysis to determine programmer intent, as well as the lack of detailed performance models that always determine the best executable code for a given computation and architecture. The latter problem can be mitigated through auto-tuning , which generates many code variants for a particular problem and empirically determines which performs best on a given architecture.;This thesis addresses the problem of how to write programs at a high level while obtaining the performance of code written by performance experts at the low level. To do so, we build domain-specific embedded languages that generate low-level parallel code from a high-level language, and then use auto-tuning to determine the best performing low-level code. Such DSELs avoid analysis by restricting the domain while ensuring programmers specify high-level intent, and by performing empirical auto-tuning instead of modeling machine parameters. As a result, programmers write in high-level languages with portions of their code using DSELs, yet obtain performance equivalent to the best hand-optimized low-level code, across many architectures.;We present a methodology for building such auto-tuned DSELs, as well as a software infrastructure and example DSELs using the infrastructure, including a DSEL for structured grid computations and two DSELs for graph algorithms. The structured grid DSEL obtains over 80% of peak performance for a variety of benchmark kernels across different architectures, while the graph algorithm DSELs mitigate all performance loss due to using a high-level language. Overall, the methodology, infrastructure, and example DSELs point to a promising new direction for obtaining high performance while programming in a high-level language.
机译:随着机器和体系结构的复杂性增加,性能调整变得更具挑战性,导致通用编译器无法生成可能的最佳优化代码。专家级的性能程序员通常可以手写出比编译器优化的低级代码好一个数量级的代码。同时,程序的复杂性也增加了,现代程序基于各种抽象层来管理复杂性,但是这些层阻碍了优化工作。实际上,从低级语言(例如Fortran或C)转换为高级语言(例如Python,Ruby或Matlab)时,通常会失去一个或两个额外的数量级性能;通用编译器是有限的由于无法进行程序分析来确定程序员的意图,以及缺乏详细的性能模型(这些模型始终为给定的计算和体系结构确定最佳的可执行代码)。后一种问题可以通过自动调整得到缓解,自动调整会针对特定问题生成许多代码变体,并凭经验确定哪种方法在给定的体系结构上性能最佳。;本文解决了如何在获得性能的同时高层编写程序的问题性能专家在较低级别编写的代码集。为此,我们构建了特定于领域的嵌入式语言,这些语言从高级语言生成低级并行代码,然后使用自动调整来确定性能最佳的低级代码。这样的DSEL通过限制域同时确保程序员指定高级意图以及通过执行经验性自动调整而不是对机器参数建模来避免分析。结果,程序员使用DSEL用部分代码编写高级语言,但在许多体系结构中却获得了与最佳手动优化的低级代码相同的性能。我们提出了一种构建这种自动调整的DSEL的方法。 ,以及软件基础架构和使用该基础架构的示例DSEL,包括用于结构化网格计算的DSEL和用于图形算法的两个DSEL。对于不同体系结构中的各种基准内核,结构化网格DSEL可获得超过80%的峰值性能,而图形算法DSEL可以缓解由于使用高级语言而导致的所有性能损失。总体而言,方法论,基础架构和示例DSEL指出了在使用高级语言进行编程时获得高性能的有希望的新方向。

著录项

  • 作者

    Kamil, Shoaib Ashraf.;

  • 作者单位

    University of California, Berkeley.;

  • 授予单位 University of California, Berkeley.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 181 p.
  • 总页数 181
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号