首页> 外文会议>IEEE International Symposium on Software Reliability Engineering >Identifying and Prioritizing Chaos Experiments by Using Established Risk Analysis Techniques
【24h】

Identifying and Prioritizing Chaos Experiments by Using Established Risk Analysis Techniques

机译:通过使用既定的风险分析技术确定混沌实验并确定其优先级

获取原文

摘要

The prevalence of microservice architectures and container orchestration technologies increases the complexity of assessing such systems’ resilience. Chaos engineering is an emerging approach for resilience assessment by testing hypotheses after intentionally injecting faults into a distributed system and observing customer- and business-affecting metrics. As the number of potential risks within a complex system is high, the identification and prioritization of effective and efficient chaos experiments are non-trivial. In the scope of an industrial case study, this work investigates means to identify and prioritize chaos experiments by using established risk analysis techniques known from engineering safety-critical systems, namely i) Fault Tree Analysis, ii) Failure Mode and Effects Analysis, iii) and Computer Hazard and Operability Study. We conducted semi-structured interviews to elicit architectural information and resilience requirements of the case study system. The extracted knowledge was leveraged during the application of the risk analysis techniques. A subset of the identified and prioritized risks was used to create and execute chaos experiments. The risk analysis resulted in over 100 findings and revealed that the system is rather fragile as it comprises a high amount of single points of failure. The chaos experiments revealed further weaknesses for formerly unknown system behavior.
机译:微服务架构和容器编排技术的盛行增加了评估此类系统的弹性的复杂性。在有意将故障注入分布式系统中并观察影响客户和业务的指标后,通过测试假设,混沌工程学是一种用于弹性评估的新兴方法。由于复杂系统中潜在的风险数量很高,因此有效的,有效的混沌实验的识别和优先级划分并非易事。在工业案例研究的范围内,这项工作研究了使用工程安全关键系统中已知的已建立的风险分析技术来识别混沌实验并确定其优先级的方法,即i)故障树分析,ii)故障模式和影响分析,iii)以及计算机危险性和可操作性研究。我们进行了半结构化访谈,以得出案例研究系统的体系结构信息和灵活性要求。提取的知识在风险分析技术的应用过程中得到了利用。已识别和优先级风险的子集用于创建和执行混乱实验。风险分析得出了100多个发现,并显示该系统相当脆弱,因为它包含大量的单点故障。混乱的实验揭示了以前未知的系统行为的其他弱点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号