首页> 外国专利> Suicide among well-mannered cluster nodes experiencing heartbeat failure

Suicide among well-mannered cluster nodes experiencing heartbeat failure

机译:行为良好的群集节点中发生心跳失败的自杀

摘要

Methods for re-configuring a cluster computer system of multiple or more nodes when the cluster experiences communications failure. First and second nodes of the cluster have respective channel controllers. A SCSI channel and the controllers communicatively connect the multiple nodes. When a node becomes aware of a possible communications failure, the node attempts to determine the authenticity the failure and responds according to the determined authenticity.;According to one method, a first node detects heartbeat node-to-node communications failure on the channel and then tests a physical drive on the channel. If the testing is successful, the node kills the other node. If the testing is unsuccessful, the first node commits suicide.;In one embodiment, the coupling includes multiple channels communicatively coupling the first and second nodes and the first node selecting one of the channels for node-to-node communications. In this environment, choosing a physical drive involves testing node-to-node communications on another of the channels if no physical drive is online on the channel (and terminating the re-configuring method). If a drive is available, the first node uses the first physical drive online on the channel for testing.;In another method, the second node initially detects communications failure and communicates that by attempting to negotiate wih the first node for a new configuration of the computer system. The first node tests a physical drive in response and negotiates with the second node if the testing was successful. If the testing was unsuccessful, the first node commits suicide.
机译:当群集遇到通信故障时,用于重新配置具有多个或更多节点的群集计算机系统的方法。群集的第一和第二节点具有各自的通道控制器。 SCSI通道和控制器可通信地连接多个节点。当节点意识到可能的通信故障时,该节点将尝试确定故障的真实性,并根据所确定的真实性进行响应。根据一种方法,第一个节点在通道上检测到心跳节点到节点的通信故障,并且然后在通道上测试物理驱动器。如果测试成功,则该节点将杀死另一个节点。如果测试不成功,则第一节点自杀。在一个实施例中,耦合包括通信地耦合第一节点和第二节点的多个信道,并且第一节点选择用于节点到节点通信的一个信道。在这种环境中,如果物理驱动器不在该通道上联机,则选择物理驱动器涉及在另一个通道上测试节点到节点的通信(并终止重新配置方法)。如果有可用的驱动器,则第一个节点使用通道上的第一个物理驱动器在线进行测试。在另一种方法中,第二个节点最初检测到通信故障,并通过尝试与第一个节点协商新的配置来进行通信。计算机系统。作为响应,第一个节点测试物理驱动器,如果测试成功,则与第二个节点协商。如果测试失败,则第一个节点自杀。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号