首页> 外文会议>IEEE/IFIP Network Operations and Management Symposium >Identifying Symptoms of Recurrent Faults in Log Files of Distributed Information Systems
【24h】

Identifying Symptoms of Recurrent Faults in Log Files of Distributed Information Systems

机译:识别分布式信息系统日志文件中的复发故障的症状

获取原文

摘要

The manual process to identifying causes of failure in distributed information systems is difficult and time-consuming. The underlying reason is the large size and complexity of these systems, and the vast amount of monitoring data they generate. Despite its high cost, this manual process is necessary in order to avoid the detrimental consequences of system downtime. Several studies and operator practice suggest that a large fraction of the failures in these systems are caused by recurrent faults. Therefore, significant efficiency gains can be achieved by automating the identification of these faults. In this work we present methods, which draw from the areas of information retrieval as well as machine learning, to automate the task of inferring symptoms pertinent to failures caused by specific faults. In particular, we present a method to infer message types from plain-text log messages, and we leverage these types to train classifiers and extract rules to identify symptoms of recurrent faults automatically.
机译:识别分布式信息系统中失败原因的手动过程难以耗时。潜在的原因是这些系统的大尺寸和复杂性,以及它们产生的大量监控数据。尽管其成本高,但本手动过程是必要的,以避免系统停机时间不利影响。一些研究和操作员练习表明,这些系统中的大部分失败是由反复出现的故障引起的。因此,通过自动识别这些故障,可以实现显着的效率提升。在这项工作中,我们提出了从信息检索和机器学习领域的方法,以自动化推断与特定故障引起的故障相关的症状的任务。特别是,我们提出了一种从普通文本日志消息中推断消息类型的方法,我们利用这些类型培训分类器并提取规则,以便自动识别经常性故障的症状。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号