首页> 外文期刊>ACM Transactions on Storage >Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Production Systems
【24h】

Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Production Systems

机译:缩放故障慢:大型生产系统中硬件性能故障的证据

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Fail-slow hardware is an under-studied failure mode. We present a study of 114 reports of fail-slow hardware incidents, collected from large-scale cluster deployments in 14 institutions. We show that all hardware types such as disk, SSD, CPU, memory, and network components can exhibit performance faults. We made several important observations such as faults convert from one form to another, the cascading root causes and impacts can be long, and fail-slow faults can have varying symptoms. From this study, we make suggestions to vendors, operators, and systems designers.
机译:故障缓慢硬件是一个研究的失败模式。 我们在14个机构中从大规模集群部署收集的114例失败硬件事件报告的研究。 我们展示了所有硬件类型,如磁盘,SSD,CPU,内存和网络组件都可以表现出性能故障。 我们做了几个重要的观察,例如故障转换为另一个形式,级联根本原因和影响可能是长的,并且失败的故障可能会有不同的症状。 从本研究开始,我们向供应商,运营商和系统设计人员提出建议。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号