首页>
外国专利>
INTER-PROCESSOR COMMUNICATIONS FAULT HANDLING IN HIGH PERFORMANCE COMPUTING NETWORKS
INTER-PROCESSOR COMMUNICATIONS FAULT HANDLING IN HIGH PERFORMANCE COMPUTING NETWORKS
展开▼
机译:高性能计算网络中的处理器间通信故障处理
展开▼
页面导航
摘要
著录项
相似文献
摘要
A computer-implemented method and system for inter-processor communications fault handling in high performance computing networks. The method includes detecting that an InfiniBand (IB) queue pair has transitioned into an error state based on an unsuccessful completion status that relates to unsuccessful delivery of a message from an initiator endpoint at a first server device to at least one target endpoint at a second server device. The initiator and target endpoints are associated with at least one application under execution. An embodiment includes inferring, when the unsuccessful completion status is indicated as flushed, that the message was in a send queue of the IB queue pair when the IB queue pair transitioned into the error state. An embodiment includes establishing an IB Direct Connect queue pair connection between the target and initiator endpoints. An embodiment includes re-queueing the message in the IB queue pair for dispatch to the target endpoint.
展开▼