首页> 外国专利> Parallel computer system, computer-implemented method, computer-readable recording medium and computer program for dynamically rerouting node traffic on parallel computer system

Parallel computer system, computer-implemented method, computer-readable recording medium and computer program for dynamically rerouting node traffic on parallel computer system

机译:并行计算机系统,计算机实现的方法,计算机可读记录介质和用于动态重新路由并行计算机系统上的节点流量的计算机程序

摘要

A method and apparatus for dynamically rerouting node processes on the compute nodes of a massively parallel computer system using hint bits to route around failed nodes or congested networks without restarting applications executing on the system. When a node has a failure or there are indications that it may fail, the application software on the system is suspended while the data on the failed node is moved to a backup node. The torus network traffic is routed around the failed node and traffic for the failed node is rerouted to the backup node. The application can then resume operation without restarting from the beginning.
机译:一种用于使用提示位在大型并行计算机系统的计算节点上动态地重新路由节点进程的方法和设备,该提示位用于在故障节点或拥塞​​的网络周围进行路由,而无需重新启动在系统上执行的应用程序。当节点发生故障或有迹象表明该节点可能发生故障时,系统将挂起系统上的应用程序软件,同时将故障节点上的数据移至备份节点。环形网络流量绕过故障节点路由,而故障节点的流量重新路由到备份节点。然后,应用程序可以恢复操作而无需从头开始。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号