checkpointing; parallel processing; checkpoint frequency; closed-form formula; coordinated-uncoordinated checkpointing marriage; exascale era; failure recovery; in-memory task-local checkpoints; on-disk global checkpoints; optimal checkpoint interval; performance improvement; system-wide checkpointing scheme; task-level checkpointing; task-parallel HPC applications; unified nonhierarchical model; Checkpointing; Fault tolerance; Fault tolerant systems; Mathematical model; Parallel processing; Performance gain; Coordinated and Uncoordinated Checkpointing; Fault tolerance; HPC and Exascale; Task-based dataflow programming;
机译:在基于悲观发件人的消息记录中组合协调和不协调检查点
机译:用于分层科学工作流的非协调异步检查点模型
机译:分布式不协调检查点的可恢复性
机译:万亿级时代的协调检查点与不协调检查点之间的婚姻
机译:HPC系统上MPI应用程序的协调检查点/重启过程容错能力。
机译:裂变酵母Crb2 / Chk1途径协调DNA损伤和纺锤体检查站响应拓扑异构酶I抑制剂诱导的复制压力。
机译:用于分层科学工作流的非协调异步检查点模型