【24h】

Fault Tolerance in Distributed Systems: A Survey

机译:分布式系统中的容错能力:一项调查

获取原文

摘要

Distributed systems can be homogeneous (cluster), or heterogeneous such as Grid, Cloud and P2P. Several problems can occur in these types of systems, such as quality of service (QoS), resource selection, load balancing and fault tolerance. Fault tolerance is a main subject regarding the design of distributed systems. When a hardware or software failure occurs in the system, it causes a failure and we call it, in this case, a fault. Moreover, in order to allow the system to continue its functionalities, even in the presence of these faults, they must find techniques, which tolerate failure; the goal of these techniques is to detect and to correct these errors. In this paper, we introduce at first an overview of the basic concepts of distributed systems and their failures types, then we present, in a detailed manner, the different techniques that tolerate fault, used to identify and to correct faults in different kinds of systems such as: cluster, grid computing, Cloud and P2P systems.
机译:分布式系统可以是同构的(集群),也可以是异构的,例如网格,云和P2P。在这些类型的系统中可能会发生一些问题,例如服务质量(QoS),资源选择,负载平衡和容错。容错是有关分布式系统设计的主要主题。当系统中发生硬件或软件故障时,会导致故障,在这种情况下,我们将其称为故障。而且,为了使系统即使在出现这些故障的情况下也能继续其功能,他们必须找到能够容忍故障的技术。这些技术的目标是检测并纠正这些错误。在本文中,我们首先介绍分布式系统的基本概念及其故障类型的概述,然后以详细的方式介绍容错的不同技术,这些技术可用于识别和纠正不同类型系统中的故障。例如:集群,网格计算,云和P2P系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号