Failure management is a key component in the attempt to provide a reliable environment. This article proposes a solution to increase the reliability of distributed systems based on the Chord Peer-to-Peer overlay. our solution is aimed at providing accurate failure information about the nodes in the system. This is a very difficult task in Peer-to-peer networks due to their dynamic nature and the inability to obtain reliable data from failure detectors. We propose a failure history service used to share failure information between peer-to-peer nodes. This novel service ensures that the information about the current state of a node, as well as its failure history, is as accurate as possible even when facing a large number of node failures. This solution aims to increase the reliability of distributed systems based on the Chord peer-to-peer overlay by providing accurate data which can be used to analyze failures over time.
展开▼