首页> 外文学位 >Locality-Driven Power Optimization Techniques for High-Performance Parallel Systems.
【24h】

Locality-Driven Power Optimization Techniques for High-Performance Parallel Systems.

机译:高性能并行系统的局部驱动功率优化技术。

获取原文
获取原文并翻译 | 示例

摘要

The computational capabilities of high-performance computing (HPC) systems continue to improve, but at a cost of increased electrical power consumption. The environmental and economic impact of this increasing power consumption is motivating research in techniques that can reduce HPC power consumption without significantly impacting the overall system performance. As processor frequency increases have plateaued due to the power and thermal dissipation limits of high density electronic components, the greatest improvement in computational performance has come from increasing hardware level parallelism at the core, processor, node, and overall system level. This trend has, in turn, driven the increased use of parallel programming paradigms to be able to take advantage of the greater hardware capabilities. One major parallel programming paradigm is the Partitioned Global Address Space (PGAS), which uses a Single Program Multiple Data (SPMD) model. For most algorithm types, SPMD threads are constantly communicating and synchronizing across the various levels of hardware parallelism and will intermittently stall because of the response latency from remote thread(s) involved in communications or synchronization.;This dissertation describes research to reduce the energy waste from these stalls by leveraging the locality-awareness principle to develop power efficient optimization techniques. Two complementary types of power optimization techniques that can be applied to many common classes of high-performance computing applications are examined. These techniques are: i) intra-process locality-driven power optimizations which oer programmers and system designers opportunities to control processor frequency and sleep states amongst threads of the same process; and ii) inter-process locality-driven power optimizations which is the application of job mix co-placement (i.e., mapping running applications to CPU cores using specific affinity patterns) and co-scheduling (i.e., job ordering based on symbiosis) to threads of different and diverse processes that are executing together on an HPC cluster. The co-placement power optimization can reduce energy consumption up to 25%.;The validation of the optimization techniques relied heavily on being able to correctly measure the power utilization of the CPU and memory subsystems. At the time we began our work on this topic, most investigation in power optimization for HPC systems was done using indirect methods such as estimation based on time or CPU performance counters. Instead, we developed a precise and scalable mechanism to directly measure discrete CPU and memory power consumption, closely synchronized with program execution time. Given the evolution of embedded power sensors in later generation IntelRTM microprocessors, we also integrated our initial non-intrusive measurement system with an intrusive measurement system using the embedded sensors. The two measurement systems working in tandem generated a large volume of experimental output, so we applied Big Data techniques to the processing of the raw data and a systematic framework in which to analyze the results. The measurement framework itself represents a significant contribution to the growing community of researchers in HPC power optimization.
机译:高性能计算(HPC)系统的计算能力不断提高,但以增加的电能消耗为代价。不断增加的功耗对环境和经济的影响促使人们进行技术研究,这些技术可以减少HPC功耗,而又不会显着影响整个系统的性能。随着处理器频率的增加由于高密度电子组件的功率和散热限制而趋于平稳,计算性能的最大改进来自内核,处理器,节点和整个系统级别的硬件级别并行性的提高。反过来,这种趋势促使人们越来越多地使用并行编程范例,以便能够利用更大的硬件功能。一种主要的并行编程范例是分区全局地址空间(PGAS),它使用单程序多数据(SPMD)模型。对于大多数算法类型,SPMD线程在各种级别的硬件并行性之间不断进行通信和同步,并且由于来自参与通信或同步的远程线程的响应延迟而间歇性地停止运行。通过利用局部性感知原理来开发这些高效的优化技术。研究了可应用于许多常见类别的高性能计算应用程序的两种互补类型的功率优化技术。这些技术是:i)进程内局部驱动的功率优化,这使程序员和系统设计人员有机会控制同一进程的线程之间的处理器频率和睡眠状态; ii)进程间局部性驱动的功率优化,这是作业混合共置(即,使用特定的相似性模式将运行中的应用程序映射到CPU内核)和联合调度(即,基于共生的作业排序)的应用在HPC群集上一起执行的不同过程的集合。共置电源优化可以减少多达25%的能耗。优化技术的验证很大程度上取决于能够正确测量CPU和内存子系统的功耗。在我们开始有关此主题的工作时,大多数HPC系统功耗优化研究都是使用间接方法完成的,例如基于时间或CPU性能计数器的估算。相反,我们开发了一种精确且可扩展的机制来直接测量离散的CPU和内存功耗,并与程序执行时间紧密同步。鉴于下一代IntelRTM微处理器中嵌入式功率传感器的发展,我们还将最初的非侵入式测量系统与使用嵌入式传感器的侵入式测量系统集成在一起。两个同时工作的测量系统产生了大量的实验输出,因此我们将大数据技术应用于原始数据的处理以及用于分析结果的系统框架。测量框架本身为HPC功率优化中不断壮大的研究人员群体做出了重大贡献。

著录项

  • 作者

    Newsom, David K.;

  • 作者单位

    The George Washington University.;

  • 授予单位 The George Washington University.;
  • 学科 Computer engineering.
  • 学位 Ph.D.
  • 年度 2016
  • 页码 135 p.
  • 总页数 135
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号