首页> 外文会议>Workshop on Parallel and distributed simulation >Semi-asynchronous checkpointing for optimistic simulation on a Myrinet based NOW
【24h】

Semi-asynchronous checkpointing for optimistic simulation on a Myrinet based NOW

机译:在基于Myrinet的NOW上进行乐观模拟的半异步检查点

获取原文

摘要

Great effort has been devoted to the design of optimized checkpointing strategies for optimistic parallel discrete event simulators. On the other hand there is less work in the direction to improve the execution mode of any single checkpoint operation. Specifically, checkpoint operations are typically charged to the CPU, thus leading to freezing of the simulation application while checkpointing is in progress, i.e. the execution mode of the checkpointing protocol is typically synchronous. In this paper we focus on improvements of the execution mode and present a software architecture, designed for myrinet based Network of Workstations (NOWs), to avoid application freezing during any checkpoint operation, thus moving the execution itself towards an asynchronous mode. This is done by charging checkpoint operations to a hardware component distinct from the CPU, namely a DMA engine. On the other hand, totally asynchronous checkpointing could suffer from data inconsistency whenever the content ofa state buffer is accessed for further modifications while a checkpoint operation involving it is not yet completed. To avoid this, the architecture includes functionalities for resynchronization on demand. We have used these functionalities to implement an execution mode of the checkpointing protocol we refer to as semi-asynchronous. By the results of an experimental study we argue that the semi-asynchronous mode can be an effective solution to almost completely remove the delay associated with any checkpoint operation from the completion time of the simulation.

机译:

一直致力于优化乐观并行离散事件模拟器的优化检查点策略的设计。另一方面,在改善任何单个检查点操作的执行模式的方向上,工作量较少。具体而言,检查点操作通常由CPU付费,从而导致在检查点进行过程中冻结模拟应用程序,即检查点协议的执行模式通常是同步的。在本文中,我们着重于执行模式的改进,并提出了一种软件架构,该架构设计用于基于myrinet的工作站网络(NOW),以避免在任何检查点操作期间冻结应用程序,从而将执行本身转移到异步模式。这是通过向与CPU不同的硬件组件(即DMA引擎)收取检查点操作来完成的。另一方面,每当访问状态缓冲区的内容进行进一步修改而涉及它的检查点操作尚未完成时,完全异步的检查点可能会遭受数据不一致的困扰。为了避免这种情况,该体系结构包括用于按需重新同步的功能。我们已经使用这些功能来实现我们称为半异步的检查点协议的执行模式。通过实验研究的结果,我们认为半异步模式可以成为一种有效的解决方案,可以从模拟的完成时间几乎完全消除与任何检查点操作相关的延迟。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号