首页> 外文会议>IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice >MicroHECL: High-Efficient Root Cause Localization in Large-Scale Microservice Systems
【24h】

MicroHECL: High-Efficient Root Cause Localization in Large-Scale Microservice Systems

机译:MicroHECL:大效根原因在大型微服务系统中的定位

获取原文

摘要

Availability issues of industrial microservice systems (e.g., drop of successfully placed orders and processed transactions) directly affect the running of the business. These issues are usually caused by various types of service anomalies which propagate along service dependencies. Accurate and high-efficient root cause localization is thus a critical challenge for large-scale industrial microservice systems. Existing approaches use service dependency graph based analysis techniques to automatically locate root causes. However, these approaches are limited due to their inaccurate detection of service anomalies and inefficient traversing of service dependency graph. In this paper, we propose a high-efficient root cause localization approach for availability issues of microservice systems, called MicroHECL. Based on a dynamically constructed service call graph, MicroHECL analyzes possible anomaly propagation chains, and ranks candidate root causes based on correlation analysis. We combine machine learning and statistical methods and design customized models for the detection of different types of service anomalies (i.e., performance, reliability, traffic). To improve the efficiency, we adopt a pruning strategy to eliminate irrelevant service calls in anomaly propagation chain analysis. Experimental studies show that MicroHECL significantly outperforms two state-of-the-art baseline approaches in terms of both accuracy and efficiency. MicroHECL has been used in Alibaba and achieves a top-3 hit ratio of 68% with root cause localization time reduced from 30 minutes to 5 minutes.
机译:工业微服务系统的可用性问题(例如,成功下降订单和已加工交易)直接影响业务的运行。这些问题通常由沿着服务依赖性传播的各种类型的服务异常引起。因此,准确且高效的根本原因本地化是大型工业微服务系统的关键挑战。现有方法使用基于服务依赖性图形的分析技术来自动定位根原因。然而,由于其不准确的服务异常检测和服务依赖图的效率低,这些方法是有限的。在本文中,我们提出了一种高效的根本原因,用于MicroHeCl的微服务系统的可用性问题。基于动态构造的服务呼叫图,MicroHECL分析可能的异常传播链,并基于相关分析排列候选根原因。我们将机器学习和统计方法和设计定制模型与不同类型的服务异常检测(即性能,可靠性,流量)相结合。为了提高效率,我们采用修剪策略来消除异常传播链分析中的无关服务呼叫。实验研究表明,微Hecl在精度和效率方面显着优于两种最先进的基线方法。 MicroHECL已在阿里巴巴使用,并通过根部导致定位时间从30分钟降至5分钟,实现68%的前3个命中率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号