Practical Resource Usage Prediction Method for Large Memory Jobs in HPC Clusters

机译：HPC集群中大内存作业的实用资源使用预测方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Users in high performance computing (HPC) clusters normally face challenges to specify accurate resource estimates for running their applications as batch jobs. Prediction is a common way to alleviate this complexity by using historical job records of previous runs to estimate resource usage for new coming jobs. Most of existing resource prediction methods directly build a single model to consider all of the jobs in clusters. However, people in production usage tend to only focus on the resource usage of jobs with certain patterns, e.g. jobs with large memory consumption. This paper proposes a practical resource prediction method for large memory jobs. The proposed method first tries to predict whether a job tends to use large memory size, and then predicts the final memory usage using a model which is trained by only historical large memory jobs. Using several real-world job traces collected from large production clusters of IBM Spectrum LSF customer sites, the evaluation results show that the average prediction errors can be reduced up to 40% for nearly 90% of large memory jobs. Meanwhile, the model training cost can be reduced over 30% for the evaluated job traces.

机译：高性能计算（HPC）集群中的用户通常面临挑战，以指定以批处理作业运行其应用程序的准确资源估计。预测是通过使用以前运行的历史记录来减轻这种复杂性来估计新的即将到来的工作的资源使用量来缓解这种复杂性的常见方法。大多数现有资源预测方法直接构建单个模型，以考虑群集中的所有作业。然而，生产使用中的人们倾向于专注于具有某些模式的工作的资源使用，例如，内存消耗大的工作。本文提出了一种用于大型内存作业的实用资源预测方法。所提出的方法首先尝试预测作业是否倾向于使用大的内存大小，然后使用仅由历史大存储作业训练的模型来预测最终内存使用。使用来自IBM Spectrum LSF客户站点的大型生产群集收集的几个实际工作迹线，评估结果表明，平均预测误差可降低高达40％的大内存作业。同时，评估的作业迹线，模型培训成本可能会超过30％。

著录项

来源
《Asian Conference on Supercomputing Frontiers》|2019年|105p|共17页
会议地点
作者
Xiuqiao Li; Nan Qi; Yuanyuan He; Bill McMillan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301-53;
关键词
Resource usage prediction; Large memory jobs; Resource manager;

机译：资源使用预测;大记忆工作;资源经理;

相似文献

外文文献
中文文献
专利

1. Mantis: Efficient Predictions of Execution Time, Energy Usage, Memory Usage and Network Usage on Smart Mobile Devices [J] . Kwon Yongin, Lee Sangmin, Yi Hayoon, Mobile Computing, IEEE Transactions on . 2015,第10期

机译：螳螂：智能移动设备上的执行时间，能源使用量，内存使用量和网络使用量的有效预测
2. MPI jobs within MPI jobs: A practical way of enabling task-level fault-tolerance in HPC workflows [J] . Wozniak Justin M., Dorier Matthieu, Ross Robert, Future generation computer systems . 2019,第Deca期

机译：MPI作业中的MPI作业：在HPC工作流程中启用任务级容错的实用方法
3. A hybrid HPC/cloud distributed infrastructure: Coupling EC2 cloud resources with HPC clusters to run large tightly coupled multiscale applications [J] . Mohamed Ben Belgacem, Bastien Chopard Future generation computer systems . 2015,第jana期

机译：混合HPC /云分布式基础架构：将EC2云资源与HPC集群耦合，以运行大型紧密耦合的多规模应用程序
4. Practical Resource Usage Prediction Method for Large Memory Jobs in HPC Clusters [C] . Xiuqiao Li, Nan Qi, Yuanyuan He, Asian conference on supercomputing frontiers . 2019

机译：HPC群集中大内存作业的实用资源使用预测方法
5. Job co-allocation strategies in multiple HPC clusters. [D] . Qin, Jinhui. 2009

机译：多个HPC群集中的作业协同分配策略。
6. Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers [O] . Mohamed S. Halawa, Rebeca P. Díaz Redondo, Ana Fernández Vilas 2020

机译：无监督的基于KPIS的HPC数据中心群体
7. Evaluating scalability and efficiency of the Resource and Job Management System on large HPC Clusters [O] . Yiannis Georgiou, Matthieu Hautreux 2015

机译：评估大型HpC群集上资源和作业管理系统的可扩展性和效率

Practical Resource Usage Prediction Method for Large Memory Jobs in HPC Clusters

摘要

著录项

相似文献

相关主题

期刊订阅