首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Low-cost checkpointing and failure recovery in mobile computing systems
【24h】

Low-cost checkpointing and failure recovery in mobile computing systems

机译:移动计算系统中的低成本检查点和故障恢复

获取原文
获取原文并翻译 | 示例

摘要

A mobile computing system consists of mobile and stationary nodes, connected to each other by a communication network. The presence of mobile nodes in the system places constraints on the permissible energy consumption and available communication bandwidth. To minimize the lost computation during recovery from node failures, periodic collection of a consistent snapshot of the system (checkpoint) is required. Locating mobile nodes contributes to the checkpointing and recovery costs. Synchronous snapshot collection algorithms, designed for static networks, either force every node in the system to take a new local snapshot, or block the underlying computation during snapshot collection. Hence, they are not suitable for mobile computing systems. If nodes take their local checkpoints independently in an uncoordinated manner, each node may have to store multiple local checkpoints in stable storage. This is not suitable for mobile nodes as they have small memory. This paper presents a synchronous snapshot collection algorithm for mobile systems that neither forces every node to take a local snapshot, nor blocks the underlying computation during snapshot collection. If a node initiates snapshot collection, local snapshots of only those nodes that have directly or transitively affected the initiator since their last snapshots need to be taken. We prove that the global snapshot collection terminates within a finite time of its invocation and the collected global snapshot is consistent. We also propose a minimal rollback/recovery algorithm in which the computation at a node is rolled back only if it depends on operations that have been undone due to the failure of node(s). Both the algorithms have low communication and storage overheads and meet the low energy consumption and low bandwidth constraints of mobile computing systems.
机译:移动计算系统由通过通信网络相互连接的移动节点和固定节点组成。系统中移动节点的存在限制了可允许的能耗和可用的通信带宽。为了使从节点故障中恢复期间的计算损失最小化,需要定期收集系统的一致快照(检查点)。定位移动节点会增加检查点和恢复成本。专为静态网络设计的同步快照收集算法会强制系统中的每个节点获取新的本地快照,或者在快照收集期间阻止基础计算。因此,它们不适用于移动计算系统。如果节点以不协调的方式独立获得其本地检查点,则每个节点可能必须将多个本地检查点存储在稳定的存储器中。这不适用于移动节点,因为它们的内存较小。本文提出了一种用于移动系统的同步快照收集算法,该算法既不强制每个节点都进行本地快照,也不会在快照收集期间阻止底层计算。如果节点启动快照收集,则仅需要获取自上次快照以来直接或转移影响启动器的那些节点的本地快照。我们证明全局快照收集在调用的有限时间内终止,并且所收集的全局快照是一致的。我们还提出了一种最小的回滚/恢复算法,其中仅当节点的计算取决于由于节点故障而已撤消的操作时才回滚该计算。两种算法都具有较低的通信和存储开销,并满足了移动计算系统的低能耗和低带宽约束。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号