首页> 外文会议>International Conference on Computational Science pt.1 >Dynamic Fault Tolerance in Distributed Simulation System
【24h】

Dynamic Fault Tolerance in Distributed Simulation System

机译:分布式仿真系统中的动态容错

获取原文

摘要

Distributed simulation system is widely used for forecasting, decision-making and scientific computing. Multi-agent and Grid have been used as platform for simulation. In order to survive from software or hardware failures and guarantee successful rate during agent migrating, system must solve the fault tolerance problem. Classic fault tolerance technology like checkpoint and redundancy can be used for distributed simulation system, but is not efficient. We present a novel fault tolerance protocol which combines the causal message logging method and prime-backup technology. The proposed protocol uses iterative backup location scheme and adaptive update interval to reduce overhead and balance the cost of fault tolerance and recovery time. The protocol has characteristics of no orphan state, and do not need the survival agents to rollback. Most important is that the recovery scheme can tolerant concurrently failures, even the permanent failure of single node. Correctness of the protocol is proved and experiments show the protocol is efficient.
机译:分布式仿真系统广泛用于预测,决策和科学计算。多代理和电网已被用作模拟平台。为了从软件或硬件故障中生存并保证代理迁移期间成功的速率,系统必须解决容错问题。经典容错技术如检查点和冗余,可用于分布式仿真系统,但不高效。我们提出了一种新颖的容错协议,它结合了因果关系记录方法和Prime-Backup技术。所提出的协议使用迭代备份位置方案和自适应更新间隔来减少开销并平衡容错和恢复时间的成本。该方案具有无孤儿状态的特点,并且不需要存活者来回滚。最重要的是,恢复方案可以容忍同时失败,即使是单个节点的永久性故障也是如此。证明了协议的正确性,实验表明协议是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号