首页> 外文会议>IEEE International conference on cluster computing >Digging deeper into cluster system logs for failure prediction and root cause diagnosis
【24h】

Digging deeper into cluster system logs for failure prediction and root cause diagnosis

机译:深入研究集群系统日志以进行故障预测和根本原因诊断

获取原文

摘要

As the sizes of supercomputers and data centers grow towards exascale, failures become normal. System logs play a critical role in the increasingly complex tasks of automatic failure prediction and diagnosis. Many methods for failure prediction are based on analyzing event logs for large scale systems, but there is still neither a widely used one to predict failures based on both non-fatal and fatal events, nor a precise one that uses fine-grained information (such as failure type, node location, related application, and time of occurrence). A deeper and more precise log analysis technique is needed. We propose a three-step approach to draw out event dependencies and to identify failure-event generating processes. First, we cluster frequent event sequences into event groups based on common events. Then we infer causal dependencies between events in each event group. Finally, we extract failure rules based on the observation that events of the same event types, on the same nodes or from the same applications have similar operational behaviors. We use this rich information to improve failure prediction. Our approach semi-automates diagnosing the root causes of failure events, making it a valuable tool for system administrators.
机译:随着超级计算机和数据中心规模的增长,甚至达到百亿亿美元级,故障已成为正常现象。系统日志在自动故障预测和诊断日益复杂的任务中起着至关重要的作用。故障预测的许多方法都是基于对大型系统的事件日志进行分析的,但是,仍然没有一种广泛使用的基于非致命和致命事件来预测故障的方法,也没有一种使用细粒度信息的精确方法(例如(如故障类型,节点位置,相关应用程序和发生时间)。需要更深入,更精确的日志分析技术。我们提出了一种三步法来绘制事件相关性并确定故障事件生成过程。首先,我们基于常见事件将频繁事件序列聚类为事件组。然后,我们推断每个事件组中事件之间的因果关系。最后,我们基于观察到的故障规则来提取故障规则,即相同事件类型,相同节点上或来自相同应用程序的事件具有相似的操作行为。我们使用这些丰富的信息来改进故障预测。我们的方法可半自动诊断故障事件的根本原因,使其成为系统管理员的宝贵工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号