首页> 外文OA文献 >Automatic Checkpointing of NQS Batch Jobs on CRAY UNICOS Systems
【2h】

Automatic Checkpointing of NQS Batch Jobs on CRAY UNICOS Systems

机译:CRAY UNICOS系统上的NQS批处理作业的自动检查点

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In most UNIX systems long running application programs are not protected against the loss of their accumulated CPU time in case of regular shutdowns or system crashes. In contrast to these systems, the UNICOS operating system provides a checkpoint/restart facility, which allows e.g. to recover NQS batch jobs after a regular system shutdown and reboot. However, there is still no function, which periodically performs checkpointing of running processes. This kind of checkpointing, which would minimize CPU time losses in case of system crashes, is completely left to the user. Unfortunately, most of the users do not care about checkpointing. Therefore, a feature was developed at KFA, allowing to checkpoint NQS batch jobs automatically after a certain CPU time interval. The key issue of this feature is a UNIX daemon which is activated together with each NQS request. We present a detailed description of the daemon and its user interface. Our experience in a production environment shows, that the CPU time losses due to system crashes can be drastically reduced by this feature.
机译:在大多数UNIX系统中,长时间运行的应用程序在常规关闭或系统崩溃的情况下,无法避免其累积的CPU时间损失。与这些系统相比,UNICOS操作系统提供了一个检查点/重新启动功能,该功能允许例如在常规系统关闭并重新引导后恢复NQS批处理作业。但是,仍然没有功能可以定期对正在运行的进程执行检查点。这种检查点将完全留给用户,这将在系统崩溃的情况下最大程度地减少CPU时间损失。不幸的是,大多数用户并不关心检查点。因此,KFA开发了一项功能,允许在一定的CPU时间间隔后自动检查NQS批处理作业。此功能的关键问题是与每个NQS请求一起激活的UNIX守护程序。我们提供了守护程序及其用户界面的详细说明。我们在生产环境中的经验表明,使用此功能可以大大减少由于系统崩溃而导致的CPU时间损失。

著录项

  • 作者

    Attig Norbert; Sander Volker;

  • 作者单位
  • 年度 1993
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号