首页> 外文OA文献 >Anomaly-based Self-Healing Framework in Distributed Systems
【2h】

Anomaly-based Self-Healing Framework in Distributed Systems

机译:分布式系统中基于异常的自我修复框架

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

One of the important design criteria for distributed systems and their applications is their reliability and robustness to hardware and software failures. The increase in complexity, interconnectedness, dependency and the asynchronous interactions between the components that include hardware resources (computers, servers, network devices), and software (application services, middleware, web services, etc.) makes the fault detection and tolerance a challenging research problem. In this dissertation, we present a self healing methodology based on the principles of autonomic computing, statistical and data mining techniques to detect faults (hardware or software) and also identify the source of the fault. In our approach, we monitor and analyze in real-time all the interactions between all the components of a distributed system using two software modules: Component Fault Manager (CFM) to monitor all set of measurement attributes for applications and nodes and Application Fault Manager (AFM) that is responsible for several activities such as monitoring, anomaly analysis, root cause analysis and recovery. We used three-dimensional array of features to capture spatial and temporal features to be used by an anomaly analysis engine to immediately generate an alert when abnormal behavior pattern is detected due to a software or hardware failure. We use several fault tolerance metrics (false positive, false negative, precision, recall, missed alarm rate, detection accuracy, latency and overhead) to evaluate the effectiveness of our self healing approach when compared to other techniques. We applied our approach to an industry standard web e-commerce application to emulate a complex e-commerce environment. We evaluate the effectiveness of our approach and its performance to detect software faults that we inject asynchronously, and compare the results for different noise levels. Our experimental results showed that by applying our anomaly based approach, false positive, false negative, missed alarm rate and detection accuracy can be improved significantly. For example, evaluating the effectiveness of this approach to detect faults injected asynchronously shows a detection rate of above 99.9% with no false alarms for a wide range of faulty and normal operational scenarios.
机译:分布式系统及其应用程序的重要设计标准之一是其对硬件和软件故障的可靠性和鲁棒性。包括硬件资源(计算机,服务器,网络设备)和软件(应用程序服务,中间件,Web服务等)的组件之间的复杂性,互连性,依赖性和异步交互的增加,使得故障检测和容错成为一个挑战。研究问题。在本文中,我们提出了一种基于自动计算,统计和数据挖掘技术的自我修复方法,以检测故障(硬件或软件)并确定故障源。在我们的方法中,我们使用两个软件模块实时监视和分析分布式系统所有组件之间的所有交互:组件故障管理器(CFM),用于监视应用程序和节点的所有度量属性集;以及应用程序故障管理器( AFM)负责多项活动,例如监视,异常分析,根本原因分析和恢复。我们使用要素的三维阵列来捕获空间和时间要素,以供异常分析引擎使用,以在由于软件或硬件故障而检测到异常行为模式时立即生成警报。与其他技术相比,我们使用了几种容错度量(误报,误报,准确性,召回率,错过的警报率,检测准确性,等待时间和开销)来评估我们的自我修复方法的有效性。我们将我们的方法应用于行业标准的Web电子商务应用程序,以模拟复杂的电子商务环境。我们评估了该方法的有效性及其性能,以检测异步注入的软件故障,并比较不同噪声水平的结果。我们的实验结果表明,通过应用基于异常的方法,可以大大提高误报,误报,漏报率和检测精度。例如,评估这种方法以检测异步注入的故障的有效性表明,在各种故障和正常运行情况下,检测率均达到99.9%以上,且没有误报。

著录项

  • 作者

    Kim Byoung Uk;

  • 作者单位
  • 年度 2008
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号