The integration of several codes to simulate physical processes or components of a nuclear energy facility facilitateslarge-scale, detailed simulation. However, integrated simulations require several weeks or months of CPU time.We developed a fault-tolerant method for cooperative execution of codes, which avoids unscheduled outage of computersor networks. The method deals with abnormal job terminations on supercomputers and file transfer errors. If acomputer causes an unexpected outage, the method attempts to submit the simulation task to an alternative computer.The method also detects transfer errors by comparing the size of files before and after transfer. The relationship betweenjobs and file transfers is connected by the fault-tolerant method, which allows us to decide the execution orderof codes by definition of file flow. This enables the operation on integrated simulations where codes are executed sequentiallyor concurrently.
展开▼