Checkpointing as a Service in Heterogeneous Cloud Environments

机译：异构云环境中的检查点即服务

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A non-invasive, cloud-agnostic approach is demonstrated for extending existing cloud platforms to include checkpoint-restart capability. Most cloud platforms currently rely on each application to provide its own fault tolerance. A uniform mechanism within the cloud itself serves two purposes: (a) direct support for long-running jobs, which would otherwise require a custom fault-tolerant mechanism for each application, and (b) the administrative capability to manage an over-subscribed cloud by temporarily swapping out jobs when higher priority jobs arrive. An advantage of this uniform approach is that it also supports parallel and distributed computations, over both TCP and InfiniBand, thus allowing traditional HPC applications to take advantage of an existing cloud infrastructure. Additionally, an integrated health-monitoring mechanism detects when long-running jobs either fail or incur exceptionally low performance, perhaps due to resource starvation, and proactively suspends the job. The cloud-agnostic feature is demonstrated by applying the implementation to two very different cloud platforms: Snooze and Open Stack. The use of a cloud-agnostic architecture also enables, for the first time, migration of applications from one cloud platform to another.

机译：演示了一种非侵入性的，与云无关的方法，用于扩展现有云平台以包括检查点重启功能。当前，大多数云平台都依赖于每个应用程序来提供自己的容错能力。云内部的统一机制本身有两个用途：（a）直接支持长期运行的作业，否则将需要为每个应用程序使用自定义的容错机制;（b）具有管理超额订购云的管理能力通过在优先级较高的作业到达时临时调换作业。这种统一方法的优势在于，它还支持TCP和InfiniBand上的并行和分布式计算，从而允许传统的HPC应用程序利用现有的云基础架构。此外，集成的健康状况监视机制可以检测长时间运行的作业失败或可能由于资源匮乏而导致的性能异常低下的情况，并主动中止作业。通过将实现应用于两个截然不同的云平台来展示不可知的云功能：贪睡和开放堆栈。与云无关的体系结构的使用还首次实现了将应用程序从一个云平台迁移到另一个云平台。

著录项

来源
《IEEE/ACM international symposium on cluster, cloud and grid computing》|2015年|61-70|共10页
会议地点
作者
Jiajun Cao; Simonin Matthieu; Cooperman Gene; Morin Christine;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
checkpoint-restart; cloud computing; distributed application; infrastructure-as-a-service; scalability; self-healing; virtualization;

机译：检查点重启;云计算;分布式应用程序;基础设施即服务;可扩展性;自我修复;虚拟化;

相似文献

外文文献
中文文献
专利

1. Towards a Cloud Service Standardization to ensure interoperability in heterogeneous Cloud based environment [J] . Majda Elhozmari, Ahmed Ettalbi International journal of computer science and network security . 2016,第7期

机译：迈向云服务标准化，以确保异构云环境中的互操作性
2. Dynamic Request Redirection and Resource Provisioning for Cloud-Based Video Services under Heterogeneous Environment [J] . Wenhua Xiao, Weidong Bao, Xiaomin Zhu, IEEE Transactions on Parallel and Distributed Systems . 2016,第7期

机译：异构环境下基于云的视频服务的动态请求重定向和资源配置
3. Cloud-Based Parameter-Driven Statistical Services and Resource Allocation in a Heterogeneous Platform on Enterprise Environment [J] . Sungju Lee, Taikyeong Jeong Symmetry . 2016,第10期

机译：企业环境的异构平台中基于云的参数驱动统计服务和资源分配
4. Checkpointing as a Service in Heterogeneous Cloud Environments [C] . Jiajun Cao, Simonin Matthieu, Cooperman Gene, IEEE/ACM international symposium on cluster, cloud and grid computing . 2015

机译：检查点作为异构云环境中的服务
5. Efficient checkpointing for heterogeneous collaborative environments: Representation, coordination, and automation. [D] . Chanchio, Kasidit. 2000

机译：异构协作环境的有效检查点：表示，协调和自动化。
6. Secure Encapsulation and Publication of Biological Services in the Cloud Computing Environment [O] . Weizhe Zhang, Xuehui Wang, Bo Lu, 2006

机译：云计算环境中生物服务的安全封装和发布
7. Checkpointing as a Service in Heterogeneous Cloud Environments [O] . Jiajun Cao, Matthieu Simonin, Gene Cooperman, 2016

机译：在异构云环境中将检查点作为服务

Checkpointing as a Service in Heterogeneous Cloud Environments

摘要

著录项

相似文献

相关主题

期刊订阅