...
首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Reliability of Heterogeneous Distributed Computing Systems in the Presence of Correlated Failures
【24h】

Reliability of Heterogeneous Distributed Computing Systems in the Presence of Correlated Failures

机译:存在相关故障时异构分布式计算系统的可靠性

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

While the reliability of distributed-computing systems (DCSs) has been widely studied under the assumption that computing elements (CEs) fail independently, the impact of correlated failures of CEs on the reliability remains an open question. Here, the problem of modeling and assessing the impact of stochastic, correlated failures on the service reliability of applications running on DCSs is tackled. The service reliability is modeled using an integrated analytical and Monte-Carlo (MC) approach. The analytical component of the model comprises a generalization of a previously developed model for reliability of non-Markovian DCSs to a setting where specific patterns of simultaneous failures in CEs are allowed. The analytical model is complemented by a MC-based procedure to draw correlated-failure patterns using the recently reported concept of probabilistic shared risk groups (PSRGs). The reliability model is further utilized to develop and optimize a novel class of dynamic task reallocation (DTR) policies that maximize the reliability of DCSs in the presence of correlated failures. Theoretical predictions, MC simulations, and results from an emulation testbed show that the reliability can be improved when DTR policies correctly account for correlated failures. The impact of correlated failures of CEs on the reliability and the key dependence of DTR policies on the type of correlated failures are also investigated.
机译:尽管在计算元素(CE)独立失效的假设下对分布式计算系统(DCS)的可靠性进行了广泛的研究,但是CE的相关失效对可靠性的影响仍然是一个悬而未决的问题。此处,解决了建模和评估随机相关故障对在DCS上运行的应用程序的服务可靠性的影响的问题。使用集成的分析和蒙特卡洛(MC)方法对服务可靠性进行建模。该模型的分析部分包括对非Markovian DCS可靠性的先前开发模型的概括,该模型允许设置CE同时失效的特定模式。该分析模型得到了基于MC的程序的补充,该程序使用最近报告的概率共享风险组(PSRG)概念绘制了相关的故障模式。可靠性模型进一步用于开发和优化一类新型的动态任务重新分配(DTR)策略,该策略在存在相关故障的情况下最大化DCS的可靠性。理论预测,MC仿真以及来自仿真测试台的结果表明,当DTR策略正确解决相关故障时,可以提高可靠性。还研究了CE相关故障对可靠性的影响以及DTR策略对相关故障类型的关键依赖性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号