首页> 外文期刊>Future generation computer systems >Bigdata logs analysis based on seq2seq networks for cognitive Internet of Things
【24h】

Bigdata logs analysis based on seq2seq networks for cognitive Internet of Things

机译:基于seq2seq网络的Bigdata日志分析用于认知物联网

获取原文
获取原文并翻译 | 示例
       

摘要

While bigdata system processes high-volume data at high speed, it also generates a large amount of logs. However, it is hard for people to predict future events based on massive, multi-source, heterogeneous bigdata logs. This paper proposes a comprehensive method for smart computation and prediction of massive logs in the internet of things (IoT). Traditional machine learning, Hidden Markov Model (HMM) and Autoregressive Integrated Moving Average Model (ARIMA) methods are not accurate enough to predict time series based data over time. In this work we first elaborate the distributed collection and storage, event location, and vectorized representations of bigdata logs. Next, we present a log fusion algorithm to convert the logs (unstructured text data) of each component of bigdata into structured data by removing noise, adding timestamps and classification labels. Then, we introduce a predictive model for bigdata system. We use an attention mechanism to improve sequence to sequence (seq2seq) algorithm and add an adjustor to globally fit the data distribution. Our experimental results show that the neural network model trained by our method has a good performance with the real-world data. Compared with the previous predictive method, the root mean square error (RMSE) is reduced by 46.65% and the R-squared (R2) fitting degree is improved by 14.28%. (C) 2018 Elsevier B.V. All rights reserved.
机译:大数据系统在高速处理大量数据的同时,还会生成大量日志。但是,人们很难根据大量的,多源的,异构的大数据日志来预测未来的事件。本文提出了一种用于智能计算和预测物联网(IoT)中大量日志的综合方法。传统的机器学习,隐马尔可夫模型(HMM)和自回归综合移动平均模型(ARIMA)方法的准确性不足以预测随时间变化的基于时间序列的数据。在这项工作中,我们首先详细阐述了分布式收集和存储,事件位置以及大数据日志的矢量化表示。接下来,我们提出一种日志融合算法,通过消除噪声,添加时间戳和分类标签,将大数据每个组成部分的日志(非结构化文本数据)转换为结构化数据。然后,我们介绍了大数据系统的预测模型。我们使用一种注意力机制来改善序列到序列(seq2seq)算法,并添加一个调节器以全局拟合数据分布。我们的实验结果表明,用我们的方法训练的神经网络模型在实际数据中具有良好的性能。与以前的预测方法相比,均方根误差(RMSE)降低了46.65%,R平方(R2)拟合度提高了14.28%。 (C)2018 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号