首页> 外文期刊>Dependable and Secure Computing, IEEE Transactions on >Probabilistic Model-Driven Recovery in Distributed Systems
【24h】

Probabilistic Model-Driven Recovery in Distributed Systems

机译:分布式系统中概率模型驱动的恢复

获取原文
获取原文并翻译 | 示例

摘要

Automatic system monitoring and recovery has the potential to provide effective, low-cost ways to improve dependability in distributed software systems. However, automating recovery is challenging in practice because accurate fault diagnosis is hampered by monitoring tools and techniques that often have low fault coverage, poor fault localization, detection delays, and false positives. In this paper, we present a holistic model-based approach that overcomes these challenges and enables automatic recovery in distributed systems. To do so, it uses theoretically sound techniques including Bayesian estimation and Markov decision theory to provide controllers that choose good, if not optimal, recovery actions according to a user-defined optimization criteria. By combining monitoring and recovery, the approach realizes benefits that could not have been obtained by using them in isolation. We experimentally validate our framework by fault injection on realistic e-commerce systems.
机译:自动系统监视和恢复有可能提供有效的低成本方法来提高分布式软件系统的可靠性。但是,在实际操作中,自动恢复具有挑战性,因为准确的故障诊断会受到监视工具和技术的阻碍,这些工具和技术通常具有较低的错误覆盖率,不良的错误定位,检测延迟和误报。在本文中,我们提出了一种基于整体模型的方法,该方法克服了这些挑战并实现了分布式系统中的自动恢复。为此,它使用包括贝叶斯估计和马尔可夫决策理论在内的理论上合理的技术来提供控制器,以根据用户定义的优化标准选择良好的(即使不是最佳的)恢复动作。通过将监视和恢复相结合,该方法实现了孤立使用它们无法获得的好处。我们通过在现实的电子商务系统上进行故障注入来实验性地验证我们的框架。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号