【24h】

NCAPS: application high availability in Unix computer clusters

机译:NCAPS:Unix计算机集群中的应用程序高可用性

获取原文

摘要

The paper presents a solution for improving the availability of applications running on a Unix computer cluster with two or more nodes. Tandem's NCAPS (NonStop Clusters Application Protection System) consists of specialized system software that is capable of recovering applications after hardware, software or operating system failures. The main component of NCAPS, the PPM (Process Pairs Manager), uses a primary and warm backup approach to achieve recovery times in the range of 10 seconds (for nodes having access to all needed resources) regardless of the application initialization time. This is a clear improvement over recovery times provided by existing high availability (HA) solutions, which are typically in the order of 1 minute plus the application reinitialization time. The PPM manages an application through a configurable user-specified state model in which state changes are triggered by detected failures or system administrator commands. Upon a state transition the PPM sends a state change command message to registered application processes. Communication between the application processes and the PPM is achieved through a set of API (application programming interface) calls provided by the OftLib (Open Fault Tolerance Library), also called FT-API. NCAPS is now available on Unix clusters composed of Tandem S4000 machines. A version to run on Tandem SSI (Single System Image) product NSC (NonStop Clusters) for a cluster of Compaq Proliant machines is under development.
机译:本文提出了一种解决方案,用于提高在具有两个或更多节点的Unix计算机集群上运行的应用程序的可用性。 Tandem的NCAPS(NonStop群集应用保护系统)由专用的系统软件组成,该软件能够在硬件,软件或操作系统出现故障后恢复应用程序。 NCAPS的主要组件PPM(进程对管理器)使用主要和热备份方法来实现10秒钟的恢复时间(对于有权访问所有所需资源的节点),而与应用程序初始化时间无关。与现有的高可用性(HA)解决方案所提供的恢复时间相比,这是一个明显的改进,该恢复时间通常约为1分钟加上应用程序重新初始化时间。 PPM通过可配置的用户指定状态模型管理应用程序,在该状态模型中,状态更改是由检测到的故障或系统管理员命令触发的。在状态转换后,PPM将状态更改命令消息发送到已注册的应用程序进程。应用程序进程和PPM之间的通信是通过OftLib(开放式容错库)(也称为FT-API)提供的一组API(应用程序编程接口)调用来实现的。 NCAPS现在可在由Tandem S4000计算机组成的Unix群集上使用。正在开发在Compaq Proliant机器群集的Tandem SSI(单系统映像)产品NSC(NonStop群集)上运行的版本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号