首页> 外文OA文献 >Supervised fault detection using unstructured server-log data to support root cause analysis
【2h】

Supervised fault detection using unstructured server-log data to support root cause analysis

机译:使用非结构化服务器日志数据监控故障检测以支持根本原因分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Fault detection is one of the most important aspects of telecommunication networks. Considering the growing scale and complexity of communication networks, maintenance and debugging have become extremely complicated and expensive. In complex systems, a higher rate of failure, due to the large number of components, has increased the importance of both fault detection and root cause analysis. Fault detection for communication networks is based on analyzing system logs from servers or different components in a network in order to determine if there is any unusual activity. However, detecting and diagnosing problems in such huge systems are challenging tasks for human, since the amount of information, which needs to be processed goes far beyond the level that can be handled manually. Therefore, there is an immense demand for automatic processing of datasets to extract the relevant data needed for detecting anomalies. In a Big Data world, using machine learning techniques to analyze log data automatically becomes more and more popular. Machine learning based fault detection does not require any prior knowledge about the types of problems and does not rely on explicit programming (such as rule-based). Machine learning has the ability to improve its performance automatically through learning from experience. In this thesis, we investigate supervised machine learning approaches to detect known faults from unstructured log data as a fast and efficient approach. As the aim is to identify abnormal cases against normal ones, anomaly detection is considered to be a binary classification. For extracting numerical features from event logs as a primary step in any classification, we used windowing along with bag-of-words approaches considering their textual characteristics (high dimension and sparseness). We focus on linear classification methods such as single layer perceptron and Support Vector Machines as promising candidate methods for supervised fault detection based on the textual characteristics of network-based server-log data. In order to generate an appropriate approach generalizing for detecting known faults, two important factors are investigated, namely the size of datasets and the time duration of faults. By investigating the experimental results concerning these two aforementioned factors, a two-layer classification is proposed to overcome the windowing and feature extraction challenges for long lasting faults. The thesis proposes a novel approach for collecting feature vectors for two layers of a two-layer classification. In the first layer we attempt to detect the starting line of each fault repetition as well as the fault duration. The obtained models from the first layer are used to create feature vectors for the second layer. In order to evaluate the learning algorithms and select the best detection model, cross validation and F-scores are used in this thesis because traditional metrics such as accuracy and error rates are not well suited for imbalanced datasets. The experimental results show that the proposed SVM classifier provides the best performance independent of fault duration, while factors such as labelling rule and reduction of the feature space have no significant effect on the performance. In addition, the results show that the two-layer classification system can improve the performance of fault detection; however, a more suited approach for collecting feature vectors with smaller time span needs to be further investigated.
机译:故障检测是电信网络最重要的方面之一。考虑到通信网络的规模和复杂性,维护和调试变得极为复杂和昂贵。在复杂的系统中,由于组件数量众多,故障率更高,这增加了故障检测和根本原因分析的重要性。通信网络的故障检测基于分析来自服务器或网络中不同组件的系统日志,以确定是否存在异常活动。但是,在如此庞大的系统中检测和诊断问题对人类来说是一项艰巨的任务,因为需要处理的信息量远远超出了可以手动处理的水平。因此,迫切需要对数据集进行自动处理以提取检测异常所需的相关数据。在大数据世界中,使用机器学习技术自动分析日志数据变得越来越流行。基于机器学习的故障检测不需要任何有关问题类型的先验知识,并且不依赖于显式编程(例如基于规则的编程)。机器学习能够通过从经验中学习来自动提高其性能。在本文中,我们研究了有监督的机器学习方法,该方法可以快速有效地从非结构化日志数据中检测已知故障。为了识别正常情况下的异常情况,将异常检测视为二元分类。为了从事件日志中提取数字特征作为任何分类的主要步骤,我们考虑到其文本特征(高维和稀疏性),使用了窗口和词袋方法。我们专注于线性分类方法,例如单层感知器和支持向量机,这是基于基于网络的服务器日志数据的文本特征的有监督的故障检测有希望的候选方法。为了生成适当的方法来概括检测已知故障,研究了两个重要因素,即数据集的大小和故障的持续时间。通过研究与上述两个因素有关的实验结果,提出了一种两层分类方法,以克服长时断层的开窗和特征提取难题。本文提出了一种新颖的方法来收集两层分类中两层的特征向量。在第一层中,我们尝试检测每个故障重复的起点以及故障持续时间。从第一层获得的模型用于为第二层创建特征向量。为了评估学习算法并选择最佳检测模型,本文使用了交叉验证和F评分,因为传统指标(如准确性和错误率)不适用于不平衡的数据集。实验结果表明,所提出的支持向量机分类器能够提供最佳的性能,而与故障持续时间无关,而诸如标注规则和特征空间减少等因素对性能却没有显着影响。结果表明,该两层分类系统可以提高故障检测的性能。然而,需要进一步研究更合适的方法来收集具有较小时间跨度的特征向量。

著录项

  • 作者

    Abbaszadeh Zahra Jr;

  • 作者单位
  • 年度 2014
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号