首页> 外国专利> INTER-PROCESSOR COMMUNICATIONS FAULT HANDLING IN HIGH PERFORMANCE COMPUTING NETWORKS

INTER-PROCESSOR COMMUNICATIONS FAULT HANDLING IN HIGH PERFORMANCE COMPUTING NETWORKS

机译:高性能计算网络中的处理器间通信故障处理

摘要

A computer-implemented method and system for inter-processor communications fault handling in high performance computing networks. The method includes detecting that an InfiniBand (IB) queue pair has transitioned into an error state based on an unsuccessful completion status that relates to unsuccessful delivery of a message from an initiator endpoint at a first server device to at least one target endpoint at a second server device. The initiator and target endpoints are associated with at least one application under execution. An embodiment includes inferring, when the unsuccessful completion status is indicated as flushed, that the message was in a send queue of the IB queue pair when the IB queue pair transitioned into the error state. An embodiment includes establishing an IB Direct Connect queue pair connection between the target and initiator endpoints. An embodiment includes re-queueing the message in the IB queue pair for dispatch to the target endpoint.
机译:一种用于高性能计算网络的处理器间通信故障处理的计算机实现的方法和系统。该方法包括检测InfiniBand(IB)队列对基于不成功的完成状态转换成错误状态,该状态与从第一服务器设备处的发起者端点交付到至少一个目标端点的未成功传送消息服务器设备。发起者和目标端点与至少一个在执行下的应用程序相关联。当指示不成功的完成状态被刷新时,该实施例包括推断,当IB队列对转换到错误状态时,该消息在IB队列对的发送队列中。一个实施例包括在目标和发起者端点之间建立IB直接连接队列对连接。一个实施例包括重新排队IB队列对中的消息以便派遣到目标端点。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号