首页> 外文期刊>Cluster computing >Self healing in System-S
【24h】

Self healing in System-S

机译:System-S中的自我修复

获取原文
获取原文并翻译 | 示例
       

摘要

Faults in a cluster are inevitable. The larger the cluster, the more likely the occurrence of some failure in hardware, in software, or by human error. System-S software must detect and self-repair failures while carrying out its prime directive—enabling stream processing program fragments to be distributed and connected to form complex applications. Depending on the type of failure, System-S may be able to continue with little or no disruption to potentially tens of thousands of interdependent and heterogeneous program fragments running across thousands of nodes.We extend the work we previously presented on the self healing nature of the job manager component in System-S by presenting how it can handle failures of other system components, applications and network infrastructure. We also evaluate the recoverability of the job management orchestrator component of System-S, considering crash failures with and without error propagation.
机译:集群中的故障是不可避免的。群集越大,硬件,软件或人为错误引起的某些故障的可能性就越大。 System-S软件在执行其主要指令时必须检测并自我修复故障,以使流处理程序片段可以分发和连接以形成复杂的应用程序。根据故障类型的不同,System-S可能可以继续运行,而对几乎成千上万个节点上运行的成千上万个相互依赖且异构的程序片段的中断几乎没有或没有中断。展示系统如何处理其他系统组件,应用程序和网络基础架构的故障,从而说明System-S中的作业管理器组件。我们还评估了System-S的作业管理协调器组件的可恢复性,考虑了有或没有错误传播的崩溃失败。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号