首页> 外文会议>IEEE Annual Computers, Software, and Applications Conference >Trace-based Intelligent Fault Diagnosis for Microservices with Deep Learning
【24h】

Trace-based Intelligent Fault Diagnosis for Microservices with Deep Learning

机译:基于轨迹的微服务深度学习智能故障诊断

获取原文

摘要

Due to the scalability, fault tolerance, and high availability, distributed microservice-based applications gradually replace traditional monolithic applications as one of the main forms of Internet applications. However, current fault diagnosis methods for distributed applications have drawbacks in coarse-grained fault location and inaccurate root-cause analysis. To address the above issues, this paper proposes a trace-based intelligent fault diagnosis approach for microservices with deep learning. First, we build a request weighted directed graph and a request string to characterize the behaviors of microservices with collected historical traces. Then, we build a normal trace dataset in normal status and a faulty dataset by injecting faults, and then calculate the expected intervals of microservices’ response time and the call sequences. After that, we train the fault diagnosis model based on the deep neural network with the trace datasets to diagnose faulty microservices. Finally, we have deployed a typical open-source microservice-based application TrainTicket to validate our approach by injecting various typical faults. The results show that our approach can effectively characterize the behavior of microservices when processing requests and effectively detect faults. For fault detection, our approach achieves 91.5% accuracy in detecting faults, and has the accuracy of 85.2% in locating root causes.
机译:由于可扩展性、容错性和高可用性,基于分布式微服务的应用逐渐取代传统的单片应用,成为互联网应用的主要形式之一。然而,目前针对分布式应用的故障诊断方法在粗粒度故障定位和不准确的根本原因分析方面存在缺陷。针对上述问题,本文提出了一种基于深度学习的微服务智能故障诊断方法。首先,我们构建一个请求加权有向图和一个请求字符串,用收集到的历史痕迹来描述微服务的行为。然后,我们通过注入故障建立正常状态下的正常跟踪数据集和故障数据集,然后计算微服务响应时间和调用序列的预期间隔。然后,利用跟踪数据集训练基于深度神经网络的故障诊断模型,对故障微服务进行诊断。最后,我们部署了一个典型的基于开源微服务的应用程序TrainTicket,通过注入各种典型故障来验证我们的方法。结果表明,我们的方法可以有效地描述微服务在处理请求时的行为,并有效地检测故障。在故障检测方面,我们的方法在检测故障方面达到91.5%的准确率,在定位根本原因方面的准确率达到85.2%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号