首页> 外文会议>International conference on cloud computing;World Congress on Services >An Approach to Failure Prediction in Cluster by Self-updating Cause-and-Effect Graph
【24h】

An Approach to Failure Prediction in Cluster by Self-updating Cause-and-Effect Graph

机译:基于自更新因果图的集群故障预测方法

获取原文

摘要

Cluster systems have been widely used in cloud computing, high-performance computing, and other fields, and the usage and scale of cluster systems have shown a sharp upward trend. Unfortunately, the larger cluster systems are more prone to failures, and the difficulty and cost of repairing failures are unusually huge. Therefore, the importance and necessity of failure prediction in cluster systems are obvious. In order to solve this severe challenge, we propose an approach to failure prediction in cluster systems by Self-Updating Cause-and-Effect Graph. Different from the previous approaches, the most novel point of our approach is that it can automatically mine the causality among log events from cluster systems, and set up and update Cause-and-Effect Graph for failure prediction throughout their life cycle. In addition, we use the real logs from Blue Gene/L system to verify the effectiveness of our approach and compare our approach to other approaches using the same logs. The result shows that our approach outperforms other approaches with the best precision and recall rate reaching 89% and 85%, respectively.
机译:集群系统已广泛应用于云计算,高性能计算等领域,集群系统的使用和规模呈急剧上升的趋势。不幸的是,较大的群集系统更容易出现故障,并且修复故障的难度和成本异常巨大。因此,集群系统中故障预测的重要性和必要性显而易见。为了解决这一严峻挑战,我们提出了一种通过自更新因果图进行集群系统故障预测的方法。与以前的方法不同,我们方法的最新颖之处在于它可以自动挖掘集群系统中日志事件之间的因果关系,并建立和更新因果图以在整个生命周期内进行故障预测。另外,我们使用来自Blue Gene / L系统的真实日志来验证我们方法的有效性,并将我们的方法与使用相同日志的其他方法进行比较。结果表明,我们的方法优于其他方法,其最佳精度和查全率分别达到89%和85%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号