Predicting Job Completion Times Using System Logs in Supercomputing Clusters

机译：使用System Logs在超级计算集群中预测工作完成时间

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most large systems such as HPC/cloud computing clusters and data centers are built from commercial off-the-shelf components. System logs are usually the main source of choice to gain insights into the system issues. Therefore, mining logs to diagnose anomalies has been an active research area. Due to the lack of organization and semantic consistency in commodity PC clusters' logs, what constitutes a fault or an error is subjective and thus building an automatic failure prediction model from log messages is hard. In this paper we sidestep the difficulty by asking a different question: Given the concomitant system log messages of a running job, can we predict the job's remaining time? We adopt Hidden Markov Model (HMM) coupled with frequency analysis to achieve this. Our HMM approach can predict 75% of jobs' remaining times with an error of less than 200 seconds.

机译：大多数大型系统，如HPC /云计算集群和数据中心是由商业现成部件构建的。系统日志通常是在系统问题中获得见解的主要选择源。因此，挖掘原木以诊断异常是一个有源研究区域。由于商品PC集群日志中缺乏组织和语义一致性，因此构成故障或错误是主观的，从而从日志消息中构建自动故障预测模型很难。在本文中，我们通过询问不同的问题来难以询问：给定伴随着运行工作的系统日志消息，我们可以预测作业的剩余时间吗？我们采用隐藏的马尔可夫模型（HMM）与频率分析相结合以实现这一目标。我们的HMM方法可以预测75％的工作剩余时间，错误误差小于200秒。

著录项

来源
《Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops》|2013年||共8页
会议地点
作者
Xin Chen; Charng-Da Lu; Karthik Pattabiraman;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393-53;
关键词
Log Analysis; Prediction; Hidden Markov Model;

机译：日志分析;预测;隐藏的马尔可夫模型;
入库时间 2022-08-20 20:01:05

相似文献

外文文献
中文文献
专利

1. Minimising mean squared deviation of job completion times about a common due date in multimachine systems [J] . B. Srirangacharyulu, G. Srinivasan European Journal of Industrial Engineering . 2011,第4期

机译：在多机系统中，将工作完成时间的平均平方偏差最小化，使其在一个公共到期日附近
2. An evolving hybrid neural approach for predicting job completion time in a semiconductor fabrication plant [J] . Toly Chen, Yi-Chi Wang European Journal of Industrial Engineering . 2010,第3期

机译：预测半导体制造厂工作完成时间的进化混合神经方法
3. A fuzzy-neural knowledge-based system for job completion time prediction and internal due date assignment in a wafer fabrication plant [J] . T. Chen International journal of systems science . 2009,第8期

机译：基于模糊神经知识的晶圆制造厂的工作完成时间预测和内部到期日分配系统
4. Predicting job completion times using system logs in supercomputing clusters [C] . Chen Xin, Lu Charng-Da, Pattabiraman Karthik Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop . 2013

机译：在超级计算集群中使用系统日志预测作业完成时间
5. Predicting Location and Time of Anomalies in Large-Scale Computing Systems via Log Mining [D] . Das, Anwesha. 2019

机译：通过日志挖掘预测大型计算系统中异常的位置和时间
6. Joint Diagnosis and Conversion Time Prediction of Progressive Mild Cognitive Impairment (pMCI) Using Low-Rank Subspace Clustering and Matrix Completion [O] . Kim-Han Thung, Pew-Thian Yap, Ehsan Adeli-M, -1

机译：使用低秩子空间聚类和矩阵完成的渐进性轻度认知障碍（pMCI）的联合诊断和转换时间预测
7. Predicting job start times on clusters [O] . Hui Li, David Groep, Jeff Templon, 2004

机译：预测群集上的作业开始时间
8. Numerical Evaluation of Performability and Job Completion Time in RepairableFault-Tolerant Systems [R] . Kulkarni, V. G., Nicola, V. F., Smith, R. M., 1990

机译：可修复容错系统中可执行性和工作完成时间的数值评估

Predicting Job Completion Times Using System Logs in Supercomputing Clusters

摘要

著录项

相似文献

相关主题

期刊订阅