Heartbeat is the most important method of fault detection in the distributed systems. A heartbeat protocol allows two nodes to detect the states of each other by exchanging messages periodically. But it remains two problems in the heartbeat detection of multi-machines. That are, the disagreement of detection results and the over costs of the master-nodes. This paper promotes a heartbeat protocol basing on multiple master-nodes (HPMM). HPMM solves the problem of the disagreement in detection results by voting and electing among master-nodes, and also improves the continuous work time as well as the availability of the system. In addition, the detection costs can be reduced by distributing workload into multiple master-nodes.
展开▼