首页> 外文学位 >Low-power System-on-Chip Processors for Energy Efficient High Performance Computing: The Texas Instruments Keystone II
【24h】

Low-power System-on-Chip Processors for Energy Efficient High Performance Computing: The Texas Instruments Keystone II

机译:用于节能高效计算的低功耗片上系统处理器:德州仪器(TI)Keystone II

获取原文
获取原文并翻译 | 示例

摘要

The High Performance Computing (HPC) community recognizes energy consumption as a major problem. Extensive research is underway to identify means to increase energy efficiency of HPC systems including consideration of alternative building blocks for future systems. This thesis considers one such system, the Texas Instruments Keystone II, a heterogeneous Low-Power System-on-Chip (LPSoC) processor that combines a quad core ARM CPU with an octa-core Digital Signal Processor (DSP). It was first released in 2012.;Four issues are considered: i) maximizing the Keystone II ARM CPU performance; ii) implementation and extension of the OpenMP programming model for the Keystone II; iii) simultaneous use of ARM and DSP cores across multiple Keystone SoCs; and iv) an energy model for applications running on LPSoCs like the Keystone II and heterogeneous systems in general.;Maximizing the performance of the ARM CPU on the Keystone II system is fundamental to adoption of this system by the HPC community and, of the ARM architecture more broadly. Key to achieving good performance is exploitation of the ARM vector instructions. This thesis presents the first detailed comparison of the use of ARM compiler intrinsic functions with automatic compiler vectorization across four generations of ARM processors. Comparisons are also made with x86 based platforms and the use of equivalent Intel vector instructions.;Implementation of the OpenMP programming model on the Keystone II system presents both challenges and opportunities. Challenges in that the OpenMP model was originally developed for a homogeneous programming environment with a common instruction set architecture, and in 2012 work had only just begun to consider how OpenMP might work with accelerators. Opportunities in that shared memory is accessible to all processing elements on the LPSoC, offering performance advantages over what typically exists with attached accelerators. This thesis presents an analysis of a prototype version of OpenMP implemented as a bare-metal runtime on the DSP of a Keystone I system. An implementation for the Keystone II that maps OpenMP 4.0 accelerator directives to OpenCL runtime library operations is presented and evaluated. Exploitation of some of the underlying hardware features of the Keystone II is also discussed.;Simultaneous use of the ARM and DSP cores across multiple Keystone II boards is fundamental to the creation of commercially viable HPC offerings based on Keystone technology. The nCore BrownDwarf and HPE Moonshot systems represent two such systems. This thesis presents a proof-of-concept implementation of matrix multiplication (GEMM) for the BrownDwarf system. The BrownDwarf utilizes both Keystone II and Keystone I SoCs through a point-to-point interconnect called Hyperlink. Details of how a novel message passing communication framework across Hyperlink was implemented to support this complex environment are provided.;An energy model that can be used to predict energy usage as a function of what fraction of a particular computation is performed on each of the available compute devices offers the opportunity for making runtime decisions on how best to minimize energy usage. This thesis presents a basic energy usage model that considers rates of executions on each device and their active and idle power usages. Using this model, it is shown that only under certain conditions does there exist an energy-optimal work partition that uses multiple compute devices. To validate the model a high resolution energy measurement environment is developed and used to gather energy measurements for a matrix multiplication benchmark running on a variety of systems. Results presented support the model.;Drawing on the four issues noted above and other developments that have occurred since the Keystone II system was first announced, the thesis concludes by making comments regarding the future of LPSoCs as building blocks for HPC systems.
机译:高性能计算(HPC)社区认识到能耗是一个主要问题。正在进行广泛的研究,以确定提高HPC系统能效的方法,包括考虑为将来的系统选择替代构件。本文考虑了这样的系统,即德州仪器(TI)Keystone II,这是一种异构的低功耗片上系统(LPSoC)处理器,该处理器将四核ARM CPU与八核数字信号处理器(DSP)结合在一起。它于2012年首次发布。;考虑了四个问题:i)最大化Keystone II ARM CPU的性能; ii)Keystone II的OpenMP编程模型的实现和扩展; iii)在多个Keystone SoC中同时使用ARM和DSP内核; iv)适用于在Keystone II和一般异构系统等LPSoC上运行的应用程序的能源模型。最大化Keystone II系统上ARM CPU的性能是HPC社区以及ARM采用该系统的基础建筑。实现良好性能的关键是利用ARM向量指令。本文首先对ARM编译器的内在功能与四代ARM处理器的自动编译器矢量化的使用进行了首次详细比较。在基于x86的平台和等效的Intel向量指令的使用上也进行了比较。在Keystone II系统上实施OpenMP编程模型既带来了挑战,也带来了机遇。 OpenMP模型最初是为具有通用指令集架构的同类编程环境开发的,而在2012年,工作才刚刚开始考虑OpenMP如何与加速器一起使用。 LPSoC上的所有处理元素都可以访问共享内存中的机会,与附加的加速器相比通常具有性能上的优势。本文对在Keystone I系统的DSP上实现为裸机运行时的OpenMP原型版本进行了分析。提出并评估了Keystone II的实现,该实现将OpenMP 4.0加速器指令映射到OpenCL运行时库操作。还讨论了Keystone II的一些基本硬件功能的利用。在多个Keystone II板上同时使用ARM和DSP内核是基于Keystone技术创建商业上可行的HPC产品的基础。 nCore BrownDwarf和HPE Moonshot系统代表两个这样的系统。本文提出了BrownDwarf系统的矩阵乘法(GEMM)的概念验证实现。 BrownDwarf通过称为Hyperlink的点对点互连同时利用Keystone II和Keystone I SoC。提供了有关如何实现跨Hyperlink的新颖消息传递通信框架以支持此复杂环境的详细信息。能源模型可用于预测能源使用情况,该能源使用情况取决于对每个可用设备执行的特定计算的百分比计算设备为制定运行时间决策提供了机会,从而可以最大程度地减少能耗。本文提出了一种基本的能源使用模型,该模型考虑了每个设备上的执行速率及其有功和闲置功耗。使用此模型,可以证明只有在某些条件下,才会存在使用多个计算设备的能源最佳工作分区。为了验证模型,开发了高分辨率能量测量环境并将其用于收集在各种系统上运行的矩阵乘法基准的能量测量。提出的结果支持该模型。总结以上提到的四个问题以及自Keystone II系统首次发布以来发生的其他发展情况,本文最后就LPSoC作为HPC系统的构建基块的未来进行了评论。

著录项

  • 作者

    Mitra, Gaurav.;

  • 作者单位

    The Australian National University (Australia).;

  • 授予单位 The Australian National University (Australia).;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 226 p.
  • 总页数 226
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生理学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号