首页> 外文会议>Annual Reliability and Maintainability Symposium >Using resource monitoring to select recovery strategies
【24h】

Using resource monitoring to select recovery strategies

机译:使用资源监控选择恢复策略

获取原文

摘要

Distributed heterogeneous embedded systems involved in the control of infrastructures, such as electric power infrastructure, need to ensure reliable services regardless of faults and changes in the environment. A fault tolerance middleware architecture containing mechanisms for adaptation of quality-of-service (QoS) is developed to assure dependable control of the components of the infrastructure. Recovery strategies are used to allow reconfiguration of the system (e.g. graceful degradation) based on the circumstances of the failure. In this paper we present why and how available resources should be also considered together with the type of failure and the circumstances of the failure in the selection of recovery strategy. Changes in the environment such as lower resources at node levels (e.g. overload of the systems) or degradation of QoS (e.g. scarce of bandwidth in case of communication links) should be considered before allocating a new process/task to another host or before taking reconfiguration decisions. A mathematical model for generating a composite indicator based on sampled parameters is presented. The mechanism for monitoring resources at the node level is described and it is presented how this can be used in the selection of a recovery action (e.g. restart/migration of processes on/to overloaded nodes should be avoided). Also, based on the communication characteristics between distributed sites (e.g. depending availability or on cost), different recovery strategies can be selected. For this paper we consider the case of two recovery strategies and we present a mechanism for selecting the appropriate one. The fault-tolerant architecture integrating the QoS monitoring mechanism achieves dynamic reconfiguration of the recovery strategies based on the changes in the environment. Also, the QoS monitoring mechanism increases the differentiation between node crash and network problems for failure suspected nodes. Another advantage of using this mechanism is the dynamic adaptation of resource allocation for an overall increase in application availability.
机译:分布式异构嵌入式系统涉及控制基础设施(如电力基础设施),需要确保无论环境的故障和变化如何,都需要确保可靠的服务。开发了一种容错中间件架构,其包含适应服务质量(QoS)的机制,以确保可靠地控制基础设施的组件。恢复策略用于根据失败的情况来重新配置系统(例如,优雅退化)。在本文中,我们展示了原因以及可用的资源也应与失败的类型和恢复策略选择失败的情况一起考虑。在将新的进程/任务分配给另一个主机之前或者之前,应该考虑在将新进程/任务分配给另一个主机之前或在接受重新配置之前或在进行重新配置之前考虑环境中的节点级别(例如,系统的过载)或QoS的过载)或QoS的稀缺的劣化(例如,在通信链路的情况下的稀缺)决定。提出了一种基于采样参数生成复合指示符的数学模型。描述了用于节点级别的资源的机制,并介绍了如何在选择恢复操作的选择中使用(例如,应避免重新启动/到过载节点上的进程迁移)。此外,基于分布站点(例如,根据可用性或成本)之间的通信特性,可以选择不同的恢复策略。对于本文,我们考虑了两种恢复策略的情况,我们提出了一种选择适当的机制。集成QoS监视机制的容错架构实现了基于环境的变化的恢复策略的动态重新配置。此外,QoS监视机制增加了节点崩溃与虚张失败节点的网络问题之间的差异。使用此机制的另一个优点是资源分配的动态调整,以便整体增加应用程序可用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号