首页> 外文期刊>Multimedia Tools and Applications >Improving the system log analysis with language model and semi-supervised classifier
【24h】

Improving the system log analysis with language model and semi-supervised classifier

机译:使用语言模型和半监督分类器改进系统日志分析

获取原文
获取原文并翻译 | 示例
           

摘要

Mining the vast amount of server-side logging data is an essential step to boost the business intelligence, as well as to facilitate the system maintenance for multimedia or IoT oriented services. Considering the vast volume of the data repository, designers of these logging-data analysis systems need to carefully balance the speed of the processing and the accuracy of the message classification. Conventional keyword-based log data monitoring and classification is sufficiently fast, but does not scale well in complex systems, especially when the target system is contributed by a large group of developers, each may differ in the way to encode the logging messages, and often carrying misleading labels. Conversely, many of the sophisticated approaches may suffer from their considerable time consumption, such that delayed processing jobs may begin to accumulate, and can hardly support the timely decision requirements. Meanwhile, we also suggest that the design of a large scale online log analysis should follow a principle that requires the least prior knowledge, in which unsupervised or semi-supervised solution is preferred. In this paper, we propose a two-stage machine learning based method, in which the system logs are regarded as the output of a quasi-natural language, pre-filtered by a perplexity score threshold, and then undergo a fine-grained classification procedure. Empirical studies on our web-services show that our method has obvious advantage in terms of processing speed and classification accuracy.
机译:挖掘大量服务器端日志数据是提高商业智能以及促进面向多媒体或IoT的服务的系统维护的重要步骤。考虑到数据存储库的巨大容量,这些日志记录数据分析系统的设计人员需要仔细平衡处理速度和消息分类的准确性。常规的基于关键字的日志数据监视和分类足够快,但是在复杂系统中无法很好地扩展,尤其是当目标系统由大量开发人员提供时,每个人在记录日志消息的编码方式上可能会有所不同,并且通常带有误导性标签。相反,许多复杂的方法可能会耗费大量时间,因此延迟的处理工作可能开始累积,几乎无法支持及时的决策要求。同时,我们还建议大规模在线日志分析的设计应遵循要求先验知识最少的原则,其中首选无监督或半监督解决方案。在本文中,我们提出了一种基于两阶段机器学习的方法,该方法将系统日志视为准自然语言的输出,并通过困惑度得分阈值进行预过滤,然后进行细粒度的分类程序。对我们的Web服务的经验研究表明,我们的方法在处理速度和分类准确性方面具有明显的优势。

著录项

  • 来源
    《Multimedia Tools and Applications》 |2019年第15期|21521-21535|共15页
  • 作者单位

    Univ Shanghai Sci & Technol, Coll Commun & Art Design, Shanghai, Peoples R China|Univ Coll Dublin, Comp Sci & Informat, Dublin, Ireland;

    State St Corp, Boston, MA USA;

    Sanming Univ, Coll Informat Engn, Sanming, Peoples R China;

    Wuhan Inst Technol, Sch Comp Sci & Engn, Wuhan, Hubei, Peoples R China;

    Univ Shanghai Sci & Technol, Coll Commun & Art Design, Shanghai, Peoples R China;

    Qingdao Binhai Univ, Coll Informat Engn, Qingdao, Shandong, Peoples R China;

    Qingdao Binhai Univ, Coll Informat Engn, Qingdao, Shandong, Peoples R China;

    Qingdao Binhai Univ, Coll Informat Engn, Qingdao, Shandong, Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Log analysis; Language model; Machine learning;

    机译:日志分析;语言模型;机器学习;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号