首页> 外文会议>Annual IEEE/IFIP International Conference on Dependable Systems and Networks >What Can We Learn from Four Years of Data Center Hardware Failures?
【24h】

What Can We Learn from Four Years of Data Center Hardware Failures?

机译:从四年的数据中心硬件故障中我们可以学到什么?

获取原文

摘要

Hardware failures have a big impact on the dependability of large-scale data centers. We present studies on over 290,000 hardware failure reports collected over the past four years from dozens of data centers with hundreds of thousands of servers. We examine the dataset statistically to discover failure characteristics along the temporal, spatial, product line and component dimensions. We specifically focus on the correlations among different failures, including batch and repeating failures, as well as the human operators' response to the failures. We reconfirm or extend findings from previous studies. We also find many new failure and recovery patterns that are the undesirable by-product of the state-of-the-art data center hardware and software design.
机译:硬件故障对大型数据中心的可靠性有很大影响。我们对过去四年来从数十个具有数十万台服务器的数据中心收集的290,000多个硬件故障报告进行了研究。我们对统计数据进行统计检查,以发现沿时间,空间,产品线和组件尺寸的故障特征。我们特别关注不同故障之间的相关性,包括批量和重复性故障,以及操作人员对故障的响应。我们重新确认或扩展先前研究的结果。我们还发现许多新的故障和恢复模式,这些都是最新的数据中心硬件和软件设计所不希望的副产品。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号