首页> 外文会议>Middleware 2008 >Diagnosing Distributed Systems with Self-propelled Instrumentation
【24h】

Diagnosing Distributed Systems with Self-propelled Instrumentation

机译:使用自行式仪器诊断分布式系统

获取原文
获取原文并翻译 | 示例

摘要

We present a three-part approach for diagnosing bugs and performance problems in production distributed environments. First, we introduce a novel execution monitoring technique that dynamically injects a fragment of code, the agent, into an application process on demand. The agent inserts instrumentation ahead of the control flow within the process and propagates into other processes, following communication events, crossing host boundaries, and collecting a distributed function-level trace of the execution. Second, we present an algorithm that separates the trace into user-meaningful activities called flows. This step simplifies manual examination and enables automated analysis of the trace. Finally, we describe our automated root cause analysis technique that compares the flows to help the analyst locate an anomalous flow and identify a function in that flow that is a likely cause of the anomaly. We demonstrate the effectiveness of our techniques by diagnosing two complex problems in the Condor distributed scheduling system.
机译:我们提供了一种由三部分组成的方法,用于诊断生产分布式环境中的错误和性能问题。首先,我们介绍了一种新颖的执行监视技术,该技术可根据需要将代码片段(代理)动态注入到应用程序流程中。代理将工具插入到流程中的控制流之前,并在通信事件之后,跨越主机边界并收集执行的分布式功能级别跟踪时,传播到其他流程中。第二,我们提出一种算法,将跟踪分为用户有意义的活动,称为流。此步骤简化了手动检查,并实现了痕迹的自动分析。最后,我们介绍了一种自动的根本原因分析技术,该技术可以比较流程以帮助分析人员找到异常流程并识别该流程中可能是异常原因的功能。我们通过诊断Condor分布式调度系统中的两个复杂问题来证明我们的技术的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号