首页> 外文会议>Asian Conference on Supercomputing Frontiers >Practical Resource Usage Prediction Method for Large Memory Jobs in HPC Clusters
【24h】

Practical Resource Usage Prediction Method for Large Memory Jobs in HPC Clusters

机译:HPC集群中大内存作业的实用资源使用预测方法

获取原文

摘要

Users in high performance computing (HPC) clusters normally face challenges to specify accurate resource estimates for running their applications as batch jobs. Prediction is a common way to alleviate this complexity by using historical job records of previous runs to estimate resource usage for new coming jobs. Most of existing resource prediction methods directly build a single model to consider all of the jobs in clusters. However, people in production usage tend to only focus on the resource usage of jobs with certain patterns, e.g. jobs with large memory consumption. This paper proposes a practical resource prediction method for large memory jobs. The proposed method first tries to predict whether a job tends to use large memory size, and then predicts the final memory usage using a model which is trained by only historical large memory jobs. Using several real-world job traces collected from large production clusters of IBM Spectrum LSF customer sites, the evaluation results show that the average prediction errors can be reduced up to 40% for nearly 90% of large memory jobs. Meanwhile, the model training cost can be reduced over 30% for the evaluated job traces.
机译:高性能计算(HPC)集群中的用户通常面临挑战,以指定以批处理作业运行其应用程序的准确资源估计。预测是通过使用以前运行的历史记录来减轻这种复杂性来估计新的即将到来的工作的资源使用量来缓解这种复杂性的常见方法。大多数现有资源预测方法直接构建单个模型,以考虑群集中的所有作业。然而,生产使用中的人们倾向于专注于具有某些模式的工作的资源使用,例如,内存消耗大的工作。本文提出了一种用于大型内存作业的实用资源预测方法。所提出的方法首先尝试预测作业是否倾向于使用大的内存大小,然后使用仅由历史大存储作业训练的模型来预测最终内存使用。使用来自IBM Spectrum LSF客户站点的大型生产群集收集的几个实际工作迹线,评估结果表明,平均预测误差可降低高达40%的大内存作业。同时,评估的作业迹线,模型培训成本可能会超过30%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号