首页> 外文会议>Network Operations and Management Symposium (NOMS), 2012 IEEE >Efficient verification of IT change operations or: How we could have prevented Amazon's cloud outage
【24h】

Efficient verification of IT change operations or: How we could have prevented Amazon's cloud outage

机译:有效验证IT变更操作或:我们如何避免Amazon的云中断

获取原文
获取原文并翻译 | 示例

摘要

On April 21st, 2011, a major outage occurred in Amazon's US east coast data center which led to significant disruptions on customer services. The root cause of the outage was an IT change to route traffic off from a router to a redundant router to conduct a network upgrade. The change was wrongly executed as a router was picked that could not handle the traffic due to capacity constraints. Consequently, network outages occurred, finally leading to unavailability, temporary, and even durable data loss of customers. We propose an object-oriented verification technique to detect conflicts among IT change operations and safety constraints, such as network capacity constraints, in the verification phase before the execution of IT changes. Based on Amazon's incident report different scenarios in static and dynamic routing environments that cause a network overload are shown to be detectable by logical verification. The verification algorithm is proven to be sound and has linear runtime complexity for Amazon's network overload scenarios. A performance analysis confirms the theoretical results and promises scalability to thousands of IT changes and safety constraints.
机译:2011年4月21日,亚马逊美国东海岸数据中心发生了严重故障,导致客户服务受到严重破坏。中断的根本原因是IT变更,将流量从路由器路由到冗余路由器以进行网络升级。由于选择了由于容量限制而无法处理流量的路由器,因此更改执行错误。因此,发生了网络中断,最终导致客户不可用,暂时甚至持久的数据丢失。我们提出一种面向对象的验证技术,以在IT更改执行之前的验证阶段检测IT更改操作与安全约束(例如网络容量约束)之间的冲突。根据Amazon的事件报告,可以通过逻辑验证检测到静态和动态路由环境中导致网络过载的不同场景。事实证明,该验证算法是可靠的,并且对于Amazon网络过载情况具有线性的运行时复杂度。性能分析证实了理论结果,并有望扩展到数千个IT变更和安全约束。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号