首页> 外文期刊>Journal of Parallel and Distributed Computing >On termination detection in crash-prone distributed systems with failure detectors
【24h】

On termination detection in crash-prone distributed systems with failure detectors

机译:具有故障检测器的易崩溃的分布式系统中的终止检测

获取原文
获取原文并翻译 | 示例

摘要

We investigate the problem of detecting termination of a distributed computation in systems where processes can fail by crashing. Specifically, when the communication topology is fully connected, we describe a way to transform any termination detection algorithm A that has been designed for a failure-free environment into a termination detection algorithm B that can tolerate process crashes. Our transformation assumes the existence of a perfect failure detector. We show that a perfect failure detector is in fact necessary to solve the termination detection problem in a crash-prone distributed system even if at most one process can crash. Let μ(n, M) and δ(n, M) denote the message complexity and detection latency, respectively, of A when the system has n processes and the underlying computation exchanges M application messages. The message complexity of B is O(n + μ(n, 0)) messages per failure more than the message complexity of A Also, its detection latency is O(δ(n, 0)) per failure more than that of A. Furthermore, application message size increases by at most log(f + 1) bits, where f is the actual number of processes that fail during an execution. We show that, when the communication topology is fully connected, under certain realistic assumption, any fault-tolerant termination detection algorithm can be forced to exchange Ω(nf) control messages in the worst-case even when at most one process may be active initially and the underlying computation does not exchange any application messages. This implies that our transformation is optimal in terms of message complexity when μ(n, 0) = O(n).rnThe fault-tolerant termination detection algorithm resulting from the transformation satisfies three desirable properties. First, it can tolerate the failure of up to n - 1 processes. Second, it does not impose any overhead on the fault-sensitive termination detection algorithm until one or more processes crash. Third, it does not block the application at any time. Further, using our transformation, we derive a fault-tolerant termination detection algorithm that is the most efficient fault-tolerant termination detection algorithm that has been proposed so far to our knowledge. Our transformation can be extended to arbitrary communication topologies provided process crashes do not partition the system.
机译:我们研究在进程可能因崩溃而失败的系统中检测分布式计算终止的问题。具体来说,当通信拓扑完全连接时,我们描述了一种将为无故障环境设计的任何终止检测算法A转换为可以容忍进程崩溃的终止检测算法B的方法。我们的转换假设存在完美的故障检测器。我们表明,即使在最多一个进程可能崩溃的情况下,实际上也需要一个完美的故障检测器来解决易崩溃的分布式系统中的终止检测问题。令μ(n,M)和δ(n,M)分别表示当系统具有n个进程并且基础计算交换M条应用程序消息时,A的消息复杂度和检测延迟。 B的消息复杂度比A的消息复杂度高出O(n +μ(n,0))条消息。此外,它的检测延迟比A的消息失效高O(δ(n,0))。此外,应用程序消息的大小最多增加log(f +1)位,其中f是在执行过程中失败的实际进程数。我们表明,在通信拓扑完全连接的情况下,在某些现实的假设下,即使最开始可能只有一个进程处于活动状态,在最坏的情况下,任何容错的终止检测算法都可以强制交换Ω(nf)控制消息。并且基础计算不交换任何应用程序消息。这意味着当μ(n,0)= O(n)时,我们的变换在消息复杂度方面是最优的。rn变换产生的容错终止检测算法满足三个理想的特性。首先,它可以容忍最多n-1个进程的失败。其次,它不会对故障敏感的终止检测算法施加任何开销,直到一个或多个进程崩溃为止。第三,它不会随时阻止应用程序。此外,通过我们的变换,我们得出了一种容错终止检测算法,这是迄今为止我们所知道的最有效的容错终止检测算法。我们的转换可以扩展到任意通信拓扑,只要进程崩溃不对系统进行分区即可。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号