首页> 外文会议>Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories >Abstracting log lines to log event types for mining software system logs
【24h】

Abstracting log lines to log event types for mining software system logs

机译:抽象日志行以记录事件类型以挖掘软件系统日志

获取原文
获取原文并翻译 | 示例

摘要

Log files contain valuable information about the execution of a system. This information is often used for debugging, operational profiling, finding anomalies, detecting security threats, measuring performance etc. The log files are usually too big for extracting this valuable information manually, even though manual perusal is still one of the more widely used techniques. Recently a variety of data mining and machine learning algorithms are being used to analyze the information in the log files. A major road block for the efficient use of these algorithms is the inherent variability present in every log line of a log file. Each log line is a combination of a static message type field and a variable parameter field. Even though both these fields are required, the analyses algorithm often requires that these be separated out, in order to find correlations in the repeating log event types. This disentangling of the message and parameter fields to find the event types is called abstraction of log lines. Each log line is abstracted to a unique ID or event type and the dynamic parameter value is extracted to give an insight on the current state of the system. In this paper we present a technique based on a clustering technique used in the Simple Log file Clustering Tool for log file abstraction. This solution is especially useful when we don't have access to the source code of the application or when the lines in the log file do not conform to a rigid structure. We evaluated our implementation on log files from the Virtual Computing Lab, a cloud computer management system at North Carolina State University, and abstracted it to 727 unique event types.
机译:日志文件包含有关系统执行的重要信息。该信息通常用于调试,操作分析,查找异常,检测安全威胁,衡量性能等。日志文件通常太大,无法手动提取此有价值的信息,尽管手动阅读仍然是使用更广泛的技术之一。近来,各种数据挖掘和机器学习算法已用于分析日志文件中的信息。有效使用这些算法的主要障碍是日志文件的每个日志行中都存在固有的可变性。每条日志行都是静态消息类型字段和可变参数字段的组合。即使这两个字段都是必需的,分析算法也经常需要将它们分开,以便在重复的日志事件类型中找到相关性。分离消息和参数字段以查找事件类型的过程称为日志行抽象。每条日志行都被抽象为唯一的ID或事件类型,并提取动态参数值以提供对系统当前状态的了解。在本文中,我们提出一种基于用于简单日志文件抽象的简单日志文件群集工具中的群集技术的技术。当我们无法访问应用程序的源代码或日志文件中的行不符合严格的结构时,此解决方案特别有用。我们评估了来自北卡罗来纳州立大学云计算机管理系统Virtual Computing Lab的日志文件的实现,并将其抽象为727种独特的事件类型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号