首页> 外文会议>International Conference on High Performance Computing and Simulation >A Scalable Framework for Online Power Modelling of High-Performance Computing Nodes in Production
【24h】

A Scalable Framework for Online Power Modelling of High-Performance Computing Nodes in Production

机译:一种可扩展框架,用于生产中高性能计算节点的在线功率建模

获取原文

摘要

Power and thermal design and management are critical components of high performance computing (HPC) systems, due to their cutting-edge position in terms of high power density and large total power consumption. Many HPC power management strategies rely on the availability of accurate compact power models, capable of predicting power consumption and tracking its sensitivity to workload parameters and operating points. In this paper we describe a methodology and a framework for training power models derived with two of the best-in-class procedures directly on the online in production nodes and without requiring dedicated training instances. The compact power models are obtained using an online regression-based approach which can track non-stationary workloads and hardware variability. Our experiments on a real-life HPC system demonstrate that the models achieve very high accuracy over all operating modes. We also demonstrate the scalability of our approach and the small amount of resources needed for the online modeling, for both the training and inference phases.
机译:功率和热设计和管理是高性能计算(HPC)系统的关键组件,由于其尖端位置在高功率密度和大的总功耗方面。许多HPC电源管理策略依赖于准确的紧凑型电源型号的可用性,能够预测功耗并跟踪其对工作负载参数和操作点的敏感性。在本文中,我们描述了一种方法和培训电源模型的方法和框架,其在生产节点中直接在线上直接在线派生两种课程,而无需专用的培训实例。使用基于在线回归的方法获得紧凑的电源模型,可以跟踪非静止工作负载和硬件变异性。我们对现实生活中的HPC系统的实验表明,模型在所有操作模式下实现了非常高的准确性。我们还展示了我们方法的可扩展性和在线建模所需的少量资源,适用于培训和推理阶段。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号