A Proactive Fault Tolerance Approach to High Performance Computing (HPC) in the Cloud

机译：云中高性能计算（HPC）的主动容错方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Cloud computing offers new computing paradigms, capacity, and flexibility to high performance computing (HPC) applications with provisioning of a large number of Virtual Machines (VMs) for computation-intensive applications using the Hardware as a Service (HaaS) model. Due, however, to the large number of VMs and electronic components in HPC systems in the cloud, any fault during the execution would result in re-running the application, which will cost time, money and energy. In this paper we present a proactive Fault Tolerance (FT) approach to HPC systems in the cloud to reduce the wall clock execution time in the presence of faults. We develop a generic FT algorithm for HPC systems in the cloud. Our algorithm does not rely on a spare node prior to prediction of a failure. We analyze the dollar cost of provisioning spare nodes to assess the value of our approach. Our experimental results obtained from a real cloud execution environment show that the wall clock execution time of the computation-intensive applications in cloud can be reduced by as much as 30%. The frequency of check pointing of computation-intensive applications can be reduced to 50% with our fault tolerance approach for HPC in the cloud, compared to current FT approaches.

机译：云计算为高性能计算（HPC）应用提供了新的计算范例，容量和灵活性，具有使用硬件作为服务（HAAS）模型的计算密集型应用程序提供大量虚拟机（VM）。然而，由于云中的HPC系统中的大量虚拟机和电子元件，执行期间的任何故障将导致重新运行应用程序，这将花费时间，金钱和能量。在本文中，我们在云中提出了一个主动容错（FT）方法，以减少故障存在的壁钟执行时间。我们为云中的HPC系统开发了一种通用的FT算法。我们的算法在预测失败之前不依赖于备用节点。我们分析了供应备用节点的美元成本，以评估我们的方法的价值。我们从真正的云执行环境获得的实验结果表明，云中计算密集型应用的壁钟执行时间可以减少多达30％。与当前FT方法相比，计算密集型应用的检查指向计算密集型应用的频率可以减少到云中HPC的容错方法。

著录项

来源
《International Conference on Social Computing and Its Applications;International Symposium on Big Data and MapReduce;International Symposium on Privacy and Security in Cloud and Social;International Workshop on Web Wisdom;International Workshop on Society Network Analysis and Information Diffusion Modeling;International Workshop on Social Network Service on Databases》|2012年||共6页
会议地点
作者
Egwutuoha Ifeanyi P.; Chen Shiping; Levy David; Selic Bran; Calvo Rafael;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301-53;
关键词
HPC; HaaS; Proactive Fault tolerance; cloud computing; computation-intensive application;

机译：HPC;HAAs;主动容错;云计算;计算密集型应用;

相似文献

外文文献
中文文献
专利

1. Cost-oriented proactive fault tolerance approach to high performance computing (HPC) in the cloud [J] . Ifeanyi P. Egwutuoha, Shiping Chen, David Levy, Parallel Algorithms and Applications . 2014,第3a4期

机译：面向成本的云中高性能计算（HPC）的主动容错方法
2. Fault Tolerance Approach To Improve Performance Computation Of Biological Jobs Using Cloud Computing. [J] . P Padmakumari Research Journal of Pharmaceutical, Biological and Chemical Sciences . 2016,第2期

机译：使用云计算改进生物作业性能计算的容错方法。
3. Fault Tolerance Approach To Improve Performance Computation Of Biological Jobs Using Cloud Computing. [J] . E. KADIVAR, Kh. RAHIMI, M. A. SHAHZAMANIAN Research Journal of Pharmaceutical, Biological and Chemical Sciences . 2016,第2期

机译：使用云计算改进生物作业性能计算的容错方法。
4. A Proactive Fault Tolerance Approach to High Performance Computing (HPC) in the Cloud [C] . Egwutuoha Ifeanyi P., Chen Shiping, Levy David, The Second International Conference on Cloud and Green Computing. . 2012

机译：云中高性能计算（HPC）的主动容错方法
5. Proactive Approach for the Prevention of DDoS Attacks in Cloud Computing Environments. [D] . Alshehry, Badr. 2016

机译：在云计算环境中预防DDoS攻击的主动方法。
6. ELIXIR-IT HPC@CINECA: high performance computing resources for the bioinformatics community [O] . Tiziana Castrignanò, Silvia Gioiosa, Tiziano Flati, 2020

机译：Elixir-IT HPC @ CineCa：生物信息学社区的高性能计算资源
7. A proactive fault tolerance framework for high performance computing (HPC) systems in the cloud [O] . Egwutuoha Ifeanyi Paulinus 2014

机译：云中高性能计算（HPC）系统的主动容错框架
8. High Performance Computing (HPC) Innovation Service Portal Pilots Cloud Computing (HPC-ISP Pilot Cloud Computing) [R] . Hochstein, L. 2011

机译：高性能计算（HpC）创新服务门户飞行员云计算（HpC-Isp试点云计算）

A Proactive Fault Tolerance Approach to High Performance Computing (HPC) in the Cloud

摘要

著录项

相似文献

相关主题

期刊订阅