首页> 外文学位 >Reliability models for HPC applications and a Cloud economic model.
【24h】

Reliability models for HPC applications and a Cloud economic model.

机译:HPC应用程序的可靠性模型和Cloud经济模型。

获取原文
获取原文并翻译 | 示例

摘要

With the enormous number of computing resources in HPC and Cloud systems, failures become a major concern. Therefore, failure behaviors such as reliability, failure rate, and mean time to failure need to be understood to manage such a large system efficiently.;This dissertation makes three major contributions in HPC and Cloud studies. First, a reliability model with correlated failures in a k-node system for HPC applications is studied. This model is extended to improve accuracy by accounting for failure correlation. Marshall-Olkin Multivariate Weibull distribution is improved by excess life, conditional Weibull, to better estimate system reliability. Also, the univariate method is proposed for estimating Marshall-Olkin Multivariate Weibull parameters of a system composed of a large number of nodes. Then, failure rate, and mean time to failure are derived. The model is validated by using log data from Blue Gene/L system at LLNL. Results show that when failures of nodes in the system have correlation, the system becomes less reliable.;Secondly, a reliability model of Cloud computing is proposed. The reliability model and mean time to failure and failure rate are estimated based on a system of k nodes and s virtual machines under four scenarios: 1) Hardware components fail independently, and software components fail independently; 2) software components fail independently, and hardware components are correlated in failure; 3) correlated software failure and independent hardware failure; and 4) dependent software and hardware failure. Results show that if the failure of the nodes and/or software in the system possesses a degree of dependency, the system becomes less reliable. Also, an increase in the number of computing components decreases the reliability of the system.;Finally, an economic model for a Cloud service provider is proposed. This economic model aims at maximizing profit based on the right pricing and rightsizing in the Cloud data center. Total cost is a key element in the model and it is analyzed by considering the Total Cost of Ownership (TCO) of the Cloud.
机译:随着HPC和云系统中大量计算资源的出现,故障成为主要问题。因此,需要了解诸如可靠性,故障率和平均故障时间之类的故障行为,以有效地管理如此大的系统。本论文对高性能计算和云计算研究做出了三大贡献。首先,研究了针对HPC应用的k节点系统中具有相关故障的可靠性模型。通过考虑故障相关性,扩展了该模型以提高准确性。 Marshall-Olkin多变量Weibull分布通过延长寿命(有条件的Weibull)得到改善,以更好地估计系统可靠性。此外,提出了单变量方法来估计由大量节点组成的系统的Marshall-Olkin多元Weibull参数。然后,得出故障率和平均故障时间。该模型通过使用LLNL的Blue Gene / L系统的日志数据进行验证。结果表明,当系统中的节点故障具有相关性时,系统的可靠性就会下降。其次,提出了云计算的可靠性模型。基于k个节点和s个虚拟机的系统在以下四种情况下估计可靠性模型以及平均故障时间和故障率:1)硬件组件独立故障,软件组件独立故障; 2)软件组件独立发生故障,而硬件组件在故障中相互关联; 3)关联软件故障和独立硬件故障;和4)依赖的软件和硬件故障。结果表明,如果系统中节点和/或软件的故障具有一定程度的依赖性,则系统的可靠性就会降低。同样,计算组件数量的增加也降低了系统的可靠性。最后,提出了一种针对云服务提供商的经济模型。此经济模型旨在基于正确的定价和Cloud数据中心的合理化来最大化利润。总成本是模型中的关键要素,并通过考虑云的总拥有成本(TCO)进行分析。

著录项

  • 作者单位

    Louisiana Tech University.;

  • 授予单位 Louisiana Tech University.;
  • 学科 Statistics.;Engineering Computer.;Computer Science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 84 p.
  • 总页数 84
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号