In a cluster computing environment, executable, check- point, and data files must be transferred between applica- tion submission and execution sites. As the memory foot print of cluster applications increases, saving and restor- ing the state of a computation in such an environment may require substantial network resources at both the start and the end of a CPU allocation. During the allocation, the ap- plication may also consume network bandwidth to periodi- cally transfer a checkpoint back to the submission site or checkpoint server and to access remote data files.
展开▼