首页> 外文学位 >Productive Programming Systems for Heterogeneous Supercomputers
【24h】

Productive Programming Systems for Heterogeneous Supercomputers

机译:异构超级计算机的高效编程系统

获取原文
获取原文并翻译 | 示例

摘要

The majority of today's scientific and data analytics workloads are still run on relatively energy inefficient, heavyweight, general-purpose processing cores, often referred to in the literature as latency-oriented architectures. The flexibility of these architectures and the programmer aids included (e.g. large and deep cache hierarchies, branch prediction logic, pre-fetch logic) makes them flexible enough to run a wide range of applications fast. However, we have started to see growth in the use of lightweight, simpler, energy-efficient, and functionally constrained cores. These architectures are commonly referred to as throughput-oriented.;Within each shared memory node, the computational backbone of future throughput-oriented HPC machines will consist of large pools of lightweight cores. The first wave of throughput-oriented computing came in the mid 2000's with the use of GPUs for general-purpose and scientific computing. Today we are entering the second wave of throughput-oriented computing, with the introduction of NVIDIA Pascal GPUs, Intel Knights Landing Xeon Phi processors, the Epiphany Co-Processor, the Sunway MPP, and other throughput-oriented architectures that enable pre-exascale computing. However, while the majority of the FLOPS in designs for future HPC systems come from throughput-oriented architectures, they are still commonly paired with latency-oriented cores which handle management functions and lightweight/un-parallelizable computational kernels. Hence, most future HPC machines will be heterogeneous in their processing cores.;However, the heterogeneity of future machines will not be limited to the processing elements. Indeed, heterogeneity will also exist in the storage, networking, memory, and software stacks of future supercomputers. As a result, it will be necessary to combine many different programming models and libraries in a single application. How to do so in a programmable and well-performing manner is an open research question. This thesis addresses this question using two approaches.;First, we explore using managed runtimes on HPC platforms. As a result of their high-level programming models, these managed runtimes have a long history of supporting data analytics workloads on commodity hardware, but often come with overheads which make them less common in the HPC domain. Managed runtimes are also not supported natively on throughput-oriented architectures.;Second, we explore the use of a modular programming model and work-stealing runtime to compose the programming and scheduling of multiple third-party HPC libraries. This approach leverages existing investment in HPC libraries, unifies the scheduling of work on a platform, and is designed to quickly support new programming model and runtime extensions.;In support of these two approaches, this thesis also makes novel contributions in tooling for future supercomputers. We demonstrate the value of checkpoints as a software development tool on current and future HPC machines, and present novel techniques in performance prediction across heterogeneous cores.
机译:如今,大多数科学和数据分析工作负载仍在能源效率相对较低,重量级,通用处理内核上运行,在文献中通常将其称为面向延迟的体系结构。这些架构的灵活性以及所包含的程序员辅助工具(例如大型和深层缓存层次结构,分支预测逻辑,预取逻辑)使其具有足够的灵活性,可以快速运行各种应用程序。但是,我们开始看到使用轻量,更简单,节能且功能受限制的内核的增长。这些架构通常称为面向吞吐量。在每个共享内存节点中,未来面向吞吐量的HPC计算机的计算主干将由大量轻量级内核组成。面向吞吐量的计算的第一波浪潮出现在2000年代中期,当时使用GPU进行通用和科学计算。今天,我们正在引入第二个面向吞吐量的计算浪潮,其中引入了NVIDIA Pascal GPU,Intel Knights Landing Xeon Phi处理器,Epiphany协处理器,Sunway MPP以及其他支持基于吞吐量的架构,这些架构可实现兆亿级计算。 。但是,尽管未来HPC系统的设计中的大多数FLOPS来自面向吞吐量的体系结构,但它们仍通常与处理管理功能和轻量级/不可并行计算内核的面向延迟的内核配对。因此,大多数未来的HPC机器的处理核心将是异构的。但是,未来机器的异构性将不仅限于处理元素。实际上,未来的超级计算机的存储,网络,内存和软件堆栈中也将存在异构性。结果,有必要在一个应用程序中组合许多不同的编程模型和库。如何以可编程且性能良好的方式进行操作是一个开放的研究问题。本文使用两种方法解决了这个问题。首先,我们探索在HPC平台上使用托管运行时的方法。由于具有高级编程模型,这些托管的运行时在支持商用硬件上的数据分析工作负载方面具有悠久的历史,但是经常会带来开销,这使其在HPC域中不那么常见。面向吞吐量的体系结构本身也不支持托管运行时。其次,我们探索使用模块化编程模型和窃取运行时来组合多个第三方HPC库的编程和调度。该方法利用了对HPC库的现有投资,统一了平台上的工作调度,并旨在快速支持新的编程模型和运行时扩展。;在这两种方法的支持下,本论文还为未来的超级计算机的工具做出了新的贡献。 。我们展示了检查点作为当前和将来的HPC机器上的软件开发工具的价值,并提出了跨异构内核的性能预测中的新颖技术。

著录项

  • 作者

    Grossman, Max.;

  • 作者单位

    Rice University.;

  • 授予单位 Rice University.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 234 p.
  • 总页数 234
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号