...
首页> 外文期刊>Concurrency and Computation >Performance and effectiveness trade-off for checkpointing in fault-tolerant distributed systems
【24h】

Performance and effectiveness trade-off for checkpointing in fault-tolerant distributed systems

机译:容错分布式系统中检查点的性能和有效性之间的权衡

获取原文
获取原文并翻译 | 示例
           

摘要

Checkpointing has a crucial impact on systems' performance and fault-tolerance effectiveness: excessive checkpointing results in performance degradation, while deficient checkpointing incurs expensive recovery. In distributed systems with independent checkpoint activities there is no easy way to determine checkpoint frequencies optimizing response-time and fault-tolerance costs at the same time. The purpose of this paper is to investigate the potentialities of a statistical decision-making procedure. We adopt a simulation-based approach for obtaining performance metrics that are afterwards used for determining a trade-off between checkpoint interval reductions and efficiency in performance. Statistical methodology including experimental design, regression analysis and optimization provides us with the framework for comparing configurations, which use possibly different fault-tolerance mechanisms (replication-based or message-logging-based). Systematic research also allows us to take into account additional design factors, such as load balancing. The method is described in terms of a standardized object replication model (OMG FT-CORBA), but it could also be applied in other (e.g. process-based) computational models.
机译:检查点对系统的性能和容错效率具有至关重要的影响:过多的检查点会导致性能下降,而检查点不足会导致昂贵的恢复。在具有独立检查点活动的分布式系统中,没有简单的方法来确定检查点频率,同时优化响应时间和容错成本。本文的目的是研究统计决策程序的潜力。我们采用基于仿真的方法来获取性能指标,该指标随后用于确定检查点间隔减少与性能效率之间的权衡。包括实验设计,回归分析和优化在内的统计方法为我们提供了用于比较配置的框架,这些配置使用可能不同的容错机制(基于复制或基于消息记录)。系统研究还使我们能够考虑其他设计因素,例如负载平衡。根据标准化对象复制模型(OMG FT-CORBA)描述了该方法,但是它也可以应用于其他(例如,基于过程的)计算模型中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号