...
首页> 外文期刊>Journal of Parallel and Distributed Computing >Modeling and optimization of non-blocking checkpointing for optimistic simulation on myrinet clusters
【24h】

Modeling and optimization of non-blocking checkpointing for optimistic simulation on myrinet clusters

机译:用于对Myrinet集群进行乐观仿真的无阻塞检查点的建模和优化

获取原文
获取原文并翻译 | 示例
           

摘要

Checkpointing-and-Communication Library (CCL) is a recently developed software which implements CPU offloaded, non-blocking checkpointing functionalities in support of optimistic parallel simulation on myrinet clusters. This is achieved by exploiting data transfer capabilities provided by a programmable DMA engine on board of myrinet network cards. Re-synchronization between CPU and DMA activities must sometimes be employed for several reasons, such as the maintenance of data consistency, thus adding overhead to (otherwise CPU cost-free) non-blocking checkpoint operations. In this paper we present a detailed cost model for non-blocking checkpointing and derive a performance effective re-synchronization semantic which we call minimum cost re-synchronization. With this semantic, an occurrence of re-synchronization either commits an on-going DMA based checkpoint operation (causing suspension of CPU activities) or aborts the operation (with possible increase in the expected rollback cost due to a reduced amount of committed checkpoints) on the basis of a minimum overhead expectation evaluated through the cost model.
机译:检查点和通信库(CCL)是最近开发的软件,可实现CPU卸载,无阻塞检查点功能,以支持对Myrinet群集进行乐观并行仿真。这是通过利用Myrinet网卡板上的可编程DMA引擎提供的数据传输功能来实现的。出于某些原因,有时必须在CPU和DMA活动之间进行重新同步,例如维护数据一致性,从而增加了非阻塞检查点操作的开销(否则,CPU无需花费)。在本文中,我们提出了一种用于非阻塞检查点的详细成本模型,并得出了一种性能有效的重新同步语义,我们将其称为最小成本重新同步。使用这种语义,重新同步的发生要么提交正在进行的基于DMA的检查点操作(导致CPU活动暂停),要么中止该操作(由于减少的提交检查点数量,可能导致预期的回滚成本增加)。通过成本模型评估的最小间接费用期望的基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号