Identifying Symptoms of Recurrent Faults in Log Files of Distributed Information Systems

机译：识别分布式信息系统日志文件中的复发故障的症状

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The manual process to identifying causes of failure in distributed information systems is difficult and time-consuming. The underlying reason is the large size and complexity of these systems, and the vast amount of monitoring data they generate. Despite its high cost, this manual process is necessary in order to avoid the detrimental consequences of system downtime. Several studies and operator practice suggest that a large fraction of the failures in these systems are caused by recurrent faults. Therefore, significant efficiency gains can be achieved by automating the identification of these faults. In this work we present methods, which draw from the areas of information retrieval as well as machine learning, to automate the task of inferring symptoms pertinent to failures caused by specific faults. In particular, we present a method to infer message types from plain-text log messages, and we leverage these types to train classifiers and extract rules to identify symptoms of recurrent faults automatically.

机译：识别分布式信息系统中失败原因的手动过程难以耗时。潜在的原因是这些系统的大尺寸和复杂性，以及它们产生的大量监控数据。尽管其成本高，但本手动过程是必要的，以避免系统停机时间不利影响。一些研究和操作员练习表明，这些系统中的大部分失败是由反复出现的故障引起的。因此，通过自动识别这些故障，可以实现显着的效率提升。在这项工作中，我们提出了从信息检索和机器学习领域的方法，以自动化推断与特定故障引起的故障相关的症状的任务。特别是，我们提出了一种从普通文本日志消息中推断消息类型的方法，我们利用这些类型培训分类器并提取规则，以便自动识别经常性故障的症状。

著录项

来源
《IEEE/IFIP Network Operations and Management Symposium》|2010年||共8页
会议地点
作者
Thomas Reidemeister; Mohammad A. Munawar; Paul A. S. Ward;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393-53;
关键词

相似文献

外文文献
中文文献
专利

1. An Efficient Web Log File Classification Techniques to Identify the Fault Data Identification Using Multi-Class Support Vector Machine Algorithm [J] . Kanna R. Rajesh Journal of computational and theoretical nanoscience . 2018,第9a10期

机译：有效的Web日志文件分类技术，以识别使用多级支持向量机算法识别故障数据标识
2. Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to File-System Faults [J] . Aishwarya Ganesan, Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, ACM Transactions on Storage . 2017,第3期

机译：冗余并不意味着容错：分析到文件系统故障的分布式存储反应
3. Identifying System Errors through Web Server Log Files in Web Log Mining [J] . Arjun Ram Meghwal, Dr. Arvind K Sharma International Journal of Computer Science and Technology . 2016,第3aVeraa1期

机译：通过Web日志挖掘中的Web服务器日志文件识别系统错误
4. Identifying symptoms of recurrent faults in log files of distributed information systems [C] . Reidemeister T., Munawar M.A., Ward P.A.S. Proceedings of the 2010 IEEE Network Operations and Management Symposium . 2010

机译：在分布式信息系统的日志文件中识别重复故障的症状
5. Enabling efficient fault tolerance in distributed file systems through erasure codes. [D] . Yu, Li. 2011

机译：通过擦除代码在分布式文件系统中实现有效的容错能力。
6. Log-Less Metadata Management on Metadata Server for Parallel File Systems [O] . Jianwei Liao, Guoqiang Xiao, Xiaoning Peng -1

机译：用于并行文件系统的元数据服务器上的无日志元数据管理
7. Using Transparent Files in a Fault Tolerant Distributed File System [O] . Marcelo Madruga, Sergio Loest, Carlos Maziero 2015

机译：在容错分布式文件系统中使用透明文件
8. Distributed System Fault Tolerance Using Sender-Based Message Logging [R] . Johnson, D. B., Zwaenepoel, W. 1990

机译：基于sender的消息记录的分布式系统容错

Identifying Symptoms of Recurrent Faults in Log Files of Distributed Information Systems

摘要

著录项

相似文献

相关主题

期刊订阅