首页> 外文期刊>IEEE transactions on dependable and secure computing >Towards Automated Log Parsing for Large-Scale Log Data Analysis
【24h】

Towards Automated Log Parsing for Large-Scale Log Data Analysis

机译:走向自动日志解析以进行大规模日志数据分析

获取原文
获取原文并翻译 | 示例
       

摘要

Logs are widely used in system management for dependability assurance because they are often the only data available that record detailed system runtime behaviors in production. Because the size of logs is constantly increasing, developers (and operators) intend to automate their analysis by applying data mining methods, therefore structured input data (e.g., matrices) are required. This triggers a number of studies on log parsing that aims to transform free-text log messages into structured events. However, due to the lack of open-source implementations of these log parsers and benchmarks for performance comparison, developers are unlikely to be aware of the effectiveness of existing log parsers and their limitations when applying them into practice. They must often reimplement or redesign one, which is time-consuming and redundant. In this paper, we first present a characterization study of the current state of the art log parsers and evaluate their efficacy on five real-world datasets with over ten million log messages. We determine that, although the overall accuracy of these parsers is high, they are not robust across all datasets. When logs grow to a large scale (e.g., 200 million log messages), which is common in practice, these parsers are not efficient enough to handle such data on a single computer. To address the above limitations, we design and implement a parallel log parser (namely POP) on top of Spark, a large-scale data processing platform. Comprehensive experiments have been conducted to evaluate POP on both synthetic and real-world datasets. The evaluation results demonstrate the capability of POP in terms of accuracy, efficiency, and effectiveness on subsequent log mining tasks.
机译:日志广泛用于系统管理中,以确保可靠性,因为日志通常是唯一的记录生产中详细的系统运行时行为的可用数据。由于日志的大小在不断增加,因此开发人员(和操作员)打算通过应用数据挖掘方法来自动进行分析,因此需要结构化的输入数据(例如矩阵)。这引发了许多有关日志解析的研究,旨在将自由文本日志消息转换为结构化事件。但是,由于缺乏这些日志解析器和性能比较基准的开源实现,因此开发人员在实践中不太可能意识到现有日志解析器的有效性及其局限性。它们通常必须重新实现或重新设计,这既费时又多余。在本文中,我们首先介绍了当前最先进的日志解析器的特性研究,并在具有超过一千万条日志消息的五个真实数据集上评估了它们的功效。我们确定,尽管这些解析器的整体准确性很高,但它们在所有数据集中都不是很可靠的。当日志大规模增长(例如,2亿条日志消息)时(这在实践中很常见),这些解析器的效率不足以在一台计算机上处​​理此类数据。为了解决上述限制,我们在大型数据处理平台Spark之上设计并实现了并行日志解析器(即POP)。已经进行了综合实验以评估合成数据集和实际数据集上的POP。评估结果证明了POP在后续日志挖掘任务上的准确性,效率和有效性方面的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号