An energy-aware fault tolerant scheduling framework for soft error resilient cloud computing systems

机译：软错误弹性云计算系统的能量感知容错调度框架

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

For modern high performance systems, aggressive technology and voltage scaling has drastically increased their susceptibility to soft errors. At the grand scale of cloud computing, it is clear that soft error induced failures will occur far more frequently, but it is unclear as to how to effectively apply current error detection and fault tolerance techniques in scale. In this paper, we focus on energy-aware fault tolerant scheduling in public, multi-user cloud systems, and explore the three-way tradeoff between reliability (in terms of soft error resiliency), performance and energy. Through a systematically optimized resource allocation, error detection approach selection, virtual machine placement, spatial/temporal redundancy augmentation and task scheduling process, the cloud service provider can achieve high error coverage and fault tolerance confidence while minimizing global energy costs under user deadline constraints. Our scheduling algorithm includes a static scheduling phase that operates on task graph based workload inputs prior to execution, and a light-weight dynamic scheduler that migrates tasks during execution in case of excessive reexecutions. All schedules are evaluated on a runtime simulation engine that (1) mimics the performance fluctuations in cloud systems, and (2) supports the injection of arbitrary fault patterns. Compared to current virtual machine or task replication techniques, we are able to reduce overall application failure rates by over 50% with approximately 76% total energy overhead.

机译：对于现代高性能系统，积极的技术和电压缩放已大大增加了它们对软错误的敏感性。在云计算的大规模领域，很明显，由软错误引起的故障将更加频繁地发生，但是对于如何有效地大规模应用当前的错误检测和容错技术尚不清楚。在本文中，我们着重于公共，多用户云系统中的能量感知容错调度，并探讨了可靠性（就软错误弹性而言），性能和能量之间的三重折衷。通过系统优化的资源分配，错误检测方法选择，虚拟机放置，空间/时间冗余扩充和任务调度过程，云服务提供商可以实现高错误覆盖率和容错置信度，同时在用户截止日期约束下将全球能源成本降至最低。我们的调度算法包括一个静态调度阶段，该阶段在执行之前对基于任务图的工作负载输入进行操作，而轻量级动态调度程序则在执行期间过度执行任务时迁移任务。所有计划都在运行时仿真引擎上进行评估，该引擎（1）模拟云系统中的性能波动，并且（2）支持注入任意故障模式。与当前的虚拟机或任务复制技术相比，我们能够将总应用程序故障率降低50％以上，而总能源开销约为76％。

著录项

来源
《Design, Automation & Test in Europe Conference and Exhibition》|2014年|1-6|共6页
会议地点 Dresden(DE)
作者
Gao, Yue; Gupta, Sandeep K.; Wang, Yanzhi; Pedram, Massoud;
展开▼
作者单位

Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles USA|c|;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Fault tolerant software systems using software configurations for cloud computing [J] . Mylara Reddy Chinnaiah, Nalini Niranjan Journal of Cloud Computing: Advances, Systems and Applications . 2018,第1期

机译：使用软件配置进行云计算的容错软件系统
2. Energy-Aware Fault-Tolerant Dynamic Task Scheduling Scheme for Virtualized Cloud Data Centers [J] . Marahatta Avinab, Wang Youshi, Zhang Fa, Mobile networks & applications . 2019,第3期

机译：虚拟化云数据中心的能源敏感型容错动态任务调度方案
3. Energy-Aware Fault-Tolerant Dynamic Task Scheduling Scheme for Virtualized Cloud Data Centers [J] . Marahatta Avinab, Wang Youshi, Zhang Fa, Mobile networks & applications . 2019,第3期

机译：虚拟化云数据中心的能量感知容错动态任务调度方案
4. An energy-aware fault tolerant scheduling framework for soft error resilient cloud computing systems [C] . Gao Yue, Gupta Sandeep K., Wang Yanzhi, Design, Automation Test in Europe Conference and Exhibition . 2014

机译：软错误弹性云计算系统的能量感知容错调度框架
5. Energy-aware Fault-tolerant Scheduling for Hard Real-time Systems. [D] . Han, Qiushi. 2015

机译：硬实时系统的能量感知容错调度。
6. Multi-Objective Approach for Energy-Aware Workflow Scheduling in Cloud Computing Environments [O] . Sonia Yassa, Rachid Chelouah, Hubert Kadima, 2013

机译：云计算环境中能源感知工作流调度的多目标方法
7. Shadow replication: An energy-aware, fault-tolerant computational model for green cloud computing [O] . Cui X, Mills B, Znati T, 2014

机译：影子复制：用于绿色云计算的能量感知，容错计算模型
8. Integrated Modeling and Analysis of Fault-Tolerant Systems with Fault-Tolerant Software [R] . Collins, A. S. 1980

机译：具有容错软件的容错系统集成建模与分析

An energy-aware fault tolerant scheduling framework for soft error resilient cloud computing systems

摘要

著录项

相似文献

相关主题

期刊订阅