首页> 外文会议>IEEE Network Operations and Management Symposium >Eigen Space Based Method for Detecting Faulty Nodes in Large Scale Enterprise Systems
【24h】

Eigen Space Based Method for Detecting Faulty Nodes in Large Scale Enterprise Systems

机译:基于特征空间的方法,用于检测大型企业系统中的故障节点

获取原文

摘要

In modern enterprise system environment when systems' performance degrades, detecting the anomaly is a hard problem. In this replicated environment, there can be hundreds or even thousands of server nodes for a single application. These nodes have implicit as well as explicit interdependencies with each other. Further due to heterogeneous capacities of nodes in the cluster, same fault may produce vastly different effect on the monitored metrics of different nodes. In case of performance problem, finding faulty node(s) in this environment is tedious and time consuming exercise with constantly changing workload, topology and SLA requirements. In this paper we present a novel eigen space based technique to detect anomaly in enterprise environment without any extra monitoring overhead. We monitor certain metrics on each of the node in cluster which are available in enterprise environment. We need a small number of most recent samples of each of these monitored metrics as our only historical information. Our technique adapts well in dynamic conditions, simple to operate and in case of an anomaly, automatically produces a list of faulty node(s). We have implemented this method in a 3-tier cluster environment with total 13 nodes. We have tested our algorithm by introducing faults in front tier, middle tier and backend tier. Our method is always able to separate out faulty nodes with high accuracy and precision.
机译:在现代企业系统环境中,当系统的性能下降时,检测异常是一个难题。在这个复制的环境中,可以有数百甚至数千个用于单个应用程序的服务器节点。这些节点具有隐含的以及彼此的显式相互依赖性。此外,由于集群中的节点的异构能力,相同的故障可能对不同节点的监视指标产生大量不同。在绩效问题的情况下,在这种环境中发现有错误的节点是繁琐且耗时的锻炼,不断变化的工作量,拓扑和SLA要求。在本文中,我们介绍了一种新的基于特征空间的技术来检测企业环境中的异常,而无需任何额外的监控开销。我们在企业环境中可用的群集中的每个节点上监控某些度量。我们需要这些受监控度量的每个受监控度量的最新样本的少量样本作为我们唯一的历史信息。我们的技术在动态条件下适应良好,操作简单,在异常的情况下,自动产生故障节点列表。我们在3层群集环境中实现了该方法,其中包含了13个节点。我们通过在前一层,中间层和后端层中引入故障来测试我们的算法。我们的方法始终能够以高精度和精度分离出故障的节点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号