首页> 外文期刊>Multimedia Tools and Applications >Improving the system log analysis with language model and semi-supervised classifier
【24h】

Improving the system log analysis with language model and semi-supervised classifier

机译:用语言模型和半监督分类改进系统日志分析

获取原文
获取原文并翻译 | 示例
           

摘要

Mining the vast amount of server-side logging data is an essential step to boost the business intelligence, as well as to facilitate the system maintenance for multimedia or IoT oriented services. Considering the vast volume of the data repository, designers of these logging-data analysis systems need to carefully balance the speed of the processing and the accuracy of the message classification. Conventional keyword-based log data monitoring and classification is sufficiently fast, but does not scale well in complex systems, especially when the target system is contributed by a large group of developers, each may differ in the way to encode the logging messages, and often carrying misleading labels. Conversely, many of the sophisticated approaches may suffer from their considerable time consumption, such that delayed processing jobs may begin to accumulate, and can hardly support the timely decision requirements. Meanwhile, we also suggest that the design of a large scale online log analysis should follow a principle that requires the least prior knowledge, in which unsupervised or semi-supervised solution is preferred. In this paper, we propose a two-stage machine learning based method, in which the system logs are regarded as the output of a quasi-natural language, pre-filtered by a perplexity score threshold, and then undergo a fine-grained classification procedure. Empirical studies on our web-services show that our method has obvious advantage in terms of processing speed and classification accuracy.
机译:挖掘大量服务器端记录数据是提高商业智能的重要步骤,以及促进多媒体或IOT导向服务的系统维护。考虑到大量的数据存储库,这些记录数据分析系统的设计者需要仔细平衡处理的速度和消息分类的准确性。传统的基于关键字的日志数据监视和分类足够快,但在复杂的系统中不符扩展,特别是当目标系统由大组开发人员贡献时,每个都可以在编码日志消息的方式中不同,并且通常携带误导性标签。相反,许多复杂的方法可能会遭受其相当大的时间消耗,使得延迟处理作业可能开始累积,并且几乎不能支持及时的决策要求。同时,我们还建议大规模在线日志分析的设计应遵循需要最不一于先验知识的原则,其中优先考虑无监督或半监督的解决方案。在本文中,我们提出了一种基于两阶段的机器学习方法,其中系统日志被视为准自然语言的输出,通过困惑得分阈值预过滤,然后经过细粒度的分类过程。我们的网络服务的实证研究表明,我们的方法在处理速度和分类准确性方面具有明显的优势。

著录项

  • 来源
    《Multimedia Tools and Applications》 |2019年第15期|21521-21535|共15页
  • 作者单位

    Univ Shanghai Sci & Technol Coll Commun & Art Design Shanghai Peoples R China|Univ Coll Dublin Comp Sci & Informat Dublin Ireland;

    State St Corp Boston MA USA;

    Sanming Univ Coll Informat Engn Sanming Peoples R China;

    Wuhan Inst Technol Sch Comp Sci & Engn Wuhan Hubei Peoples R China;

    Univ Shanghai Sci & Technol Coll Commun & Art Design Shanghai Peoples R China;

    Qingdao Binhai Univ Coll Informat Engn Qingdao Shandong Peoples R China;

    Qingdao Binhai Univ Coll Informat Engn Qingdao Shandong Peoples R China;

    Qingdao Binhai Univ Coll Informat Engn Qingdao Shandong Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Log analysis; Language model; Machine learning;

    机译:日志分析;语言模型;机器学习;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号