【24h】

A Closer Look at Fault Tolerance

机译:仔细看看容错

获取原文

摘要

The traditional notion of fault tolerance requires that all the correct participating processes eventually terminate, and thus, is not sensitive to the number of correct processes that should properly terminate as a result of failures. Intuitively, an algorithm that in the presence of any number of faults always guarantees that all the correct processes except maybe one properly terminate, is more resilient to faults than an algorithm that in the presence of a single fault does not even guarantee that a single correct process ever terminates. However, according to the standard notion of fault tolerance both algorithms are classified as algorithms that can not tolerate a single fault. To overcome this difficulty, we generalize the traditional notion of fault tolerance in a way which enables to capture more sensitive information about the resiliency of an algorithm. Then, we present several algorithms for solving classical problems which are resilient under the new notion. It-is well known that, in an asynchronous systems where processes communicate either by reading and writing atomic registers or by sending and receiving messages, important problems such as, consensus, set-consensus, election, perfect renaming, implementations of a test-and-set bit, a shared stack, a swap object and a fetch-and-add object have no deterministic solutions which can tolerate even a single fault. We show that while, some of these problems have solutions which guarantee that in the presence of any number of faults most of the correct processes will properly terminate; other problems do not even have solutions which guarantee that in the presence of just one fault at least one correct process properly terminates.
机译:传统的容错概念要求所有正确的参与过程最终终止,因此,对应根据故障终止的正确进程的数量不敏感。直观地,一个算法在存在任何数量的故障时始终保证除了可能一个正确终止之外的所有正确的进程,比在存在单个故障的情况下甚至没有保证单个正确的算法更具弹性进程终止了。然而,根据容错的标准概念,这两个算法都被分类为无法容忍单个故障的算法。为了克服这种困难,我们以一种方式概括了传统的容错概念,这使得能够捕获有关算法弹性的更敏感的信息。然后,我们提出了几种用于解决新概念下的古典问题的算法。众所周知,在流程通过读写和写入原子寄存器或通过发送和接收消息进行通信的异步系统中,诸如共识,设施,共识,选举,完美重命名,测试的重要问题,以及测试的重要问题--Set位,共享堆栈,交换对象和获取和添加对象没有确定性解决方案,即使是单个故障也可以容忍。我们表明,其中一些问题具有解决方案,保证在存在任何数量的故障情况下,大多数正确的过程将适当终止;其他问题甚至没有解决方案,保证在只有一个故障的情况下至少有一个正确的过程正确终止。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号