Automatic Checkpointing of NQS Batch Jobs on CRAY UNICOS Systems

机译：CRAY UNICOS系统上的NQS批处理作业的自动检查点

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In most UNIX systems long running application programs are not protected against the loss of their accumulated CPU time in case of regular shutdowns or system crashes. In contrast to these systems, the UNICOS operating system provides a checkpoint/restart facility, which allows e.g. to recover NQS batch jobs after a regular system shutdown and reboot. However, there is still no function, which periodically performs checkpointing of running processes. This kind of checkpointing, which would minimize CPU time losses in case of system crashes, is completely left to the user. Unfortunately, most of the users do not care about checkpointing. Therefore, a feature was developed at KFA, allowing to checkpoint NQS batch jobs automatically after a certain CPU time interval. The key issue of this feature is a UNIX daemon which is activated together with each NQS request. We present a detailed description of the daemon and its user interface. Our experience in a production environment shows, that the CPU time losses due to system crashes can be drastically reduced by this feature.

机译：在大多数UNIX系统中，长时间运行的应用程序在常规关闭或系统崩溃的情况下，无法避免其累积的CPU时间损失。与这些系统相比，UNICOS操作系统提供了一个检查点/重新启动功能，该功能允许例如在常规系统关闭并重新引导后恢复NQS批处理作业。但是，仍然没有功能可以定期对正在运行的进程执行检查点。这种检查点将完全留给用户，这将在系统崩溃的情况下最大程度地减少CPU时间损失。不幸的是，大多数用户并不关心检查点。因此，KFA开发了一项功能，允许在一定的CPU时间间隔后自动检查NQS批处理作业。此功能的关键问题是与每个NQS请求一起激活的UNIX守护程序。我们提供了守护程序及其用户界面的详细说明。我们在生产环境中的经验表明，使用此功能可以大大减少由于系统崩溃而导致的CPU时间损失。

著录项

作者
Attig Norbert; Sander Volker;
展开▼
作者单位

展开▼
年度 1993
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. Cray programming environments within containers on Cray XC systems [J] . Martinasso Maxime, Gila Miguel, Sawyer William, Concurrency, practice and experience . 2020,第20期

机译：CRAY XC系统容器中的CRAY编程环境
2. Design of a Hybrid Genetic Algorithm for Parallel Machines Scheduling to Minimize Job Tardiness and Machine Deteriorating Costs with Deteriorating Jobs in a Batched Delivery System [J] . Saidi-Mehrabad Mohammad, Bairamzadeh Samira Journal of Optimization in Industrial Engineering . 2018,第1期

机译：批量交付系统中并行作业调度的混合遗传算法设计，以最小化作业延迟和作业成本降低的同时降低机器成本
3. Optimal (r, nQ, T) batch-ordering policy under stationary demand [J] . A.G. Lagodimos, I.T. Christou, K. Skouri International journal of systems science . 2012,第7a9期

机译：固定需求下的最优（r，nQ，T）批量订购策略
4. Parameter Optimization of Job Scheduler Based on NQS Simulator [C] . Rika ITO International Conference on Innovative Computing, Information and Control . 2008

机译：基于NQS模拟器的Job Scheduler参数优化
5. Steady state model and automatic control algorithm of a municipal solid waste (MSW) batch gasification system. [D] . Witte, Drew. 2014

机译：城市生活垃圾气化系统的稳态模型和自动控制算法。
6. Parallel Batch Scheduling of Deteriorating Jobs with Release Dates and Rejection [O] . Juan Zou, Cuixia Miao -1

机译：带有发布日期和拒绝的恶化作业的并行批处理计划
7. How to write a plugin to export job, power, energy, and system environmental data from your Cray® XC™ system [O] . Steven Martin, Cary Whitney, David Rush, 2017

机译：如何从Cray®XC™系统编写插件以导出作业，电源，能源和系统环境数据

Automatic Checkpointing of NQS Batch Jobs on CRAY UNICOS Systems

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅