首页> 美国卫生研究院文献>Sensors (Basel Switzerland) >LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things
【2h】

LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things

机译:LogEvent2vec:物联网中大型日志的基于LogEvent到矢量的异常检测

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Log anomaly detection is an efficient method to manage modern large-scale Internet of Things (IoT) systems. More and more works start to apply natural language processing (NLP) methods, and in particular word2vec, in the log feature extraction. Word2vec can extract the relevance between words and vectorize the words. However, the computing cost of training word2vec is high. Anomalies in logs are dependent on not only an individual log message but also on the log message sequence. Therefore, the vector of words from word2vec can not be used directly, which needs to be transformed into the vector of log events and further transformed into the vector of log sequences. To reduce computational cost and avoid multiple transformations, in this paper, we propose an offline feature extraction model, named LogEvent2vec, which takes the log event as input of word2vec to extract the relevance between log events and vectorize log events directly. LogEvent2vec can work with any coordinate transformation methods and anomaly detection models. After getting the log event vector, we transform log event vector to log sequence vector by bary or tf-idf and three kinds of supervised models (Random Forests, Naive Bayes, and Neural Networks) are trained to detect the anomalies. We have conducted extensive experiments on a real public log dataset from BlueGene/L (BGL). The experimental results demonstrate that LogEvent2vec can significantly reduce computational time by 30 times and improve accuracy, comparing with word2vec. LogEvent2vec with bary and Random Forest can achieve the best F1-score and LogEvent2vec with tf-idf and Naive Bayes needs the least computational time.
机译:日志异常检测是管理现代大规模物联网(IoT)系统的有效方法。越来越多的作品开始在日志特征提取中应用自然语言处理(NLP)方法,尤其是word2vec。 Word2vec可以提取单词之间的相关性并将这些单词向量化。但是,训练word2vec的计算成本很高。日志异常不仅取决于单个日志消息,而且还取决于日志消息的顺序。因此,无法直接使用word2vec中的单词向量,需要将其转换为对数事件向量,然后进一步转换为对数序列向量。为了降低计算成本并避免多次转换,在本文中,我们提出了一个离线特征提取模型LogEvent2vec,该模型将日志事件作为word2vec的输入,以提取日志事件之间的相关性并直接矢量化日志事件。 LogEvent2vec可以使用任何坐标转换方法和异常检测模型。得到对数事件向量后,我们通过bary或tf-idf将对数事件向量转换为对数序列向量,并训练了三种监督模型(随机森林,朴素贝叶斯和神经网络)来检测异常。我们已经对BlueGene / L(BGL)的真实公共日志数据集进行了广泛的实验。实验结果表明,与word2vec相比,LogEvent2vec可以显着减少30倍的计算时间并提高准确性。具有bary和Random Forest的LogEvent2vec可以获得最佳的F1得分,具有tf-idf和Naive Bayes的LogEvent2vec所需的计算时间最少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号