首页>
外国专利>
Fault containment and error recovery in a scalable multiprocessor
Fault containment and error recovery in a scalable multiprocessor
展开▼
机译:可伸缩多处理器中的故障遏制和错误恢复
展开▼
页面导航
摘要
著录项
相似文献
摘要
A multi-processor computer system permits various types of partitions to be implemented to contain and isolate hardware failures. The various types of partitions include hard, semi-hard, firm, and soft partitions. Each partition can include one or more processors. Upon detecting a failure associated with a processor, the connection to adjacent processors in the system can be severed, thereby precluding corrupted data from contaminating the rest of the system. If an inter-processor connection is severed, message traffic in the system can become congested as messages become backed up in other processors. Accordingly, each processor includes various timers to monitor for traffic congestion that may be due to a severed connection. Rather than letting the processor continue to wait to be able to transmit its messages, the timers will expire at preprogrammed time periods and the processor will take appropriate action, such as simply dropping queued messages, to keep the system from locking up.
展开▼