首页> 外文会议>International conference on parallel and distributed processing techniques and applications;PDPTA'99 >Low-overhead fault-tolerance for java parallel applications on heterogeneous networked computers
【24h】

Low-overhead fault-tolerance for java parallel applications on heterogeneous networked computers

机译:异构网络计算机上Java并行应用程序的低开销容错

获取原文

摘要

In this paper, we implement a software package to support low-overhead fault-tolerance for Java parallel applications on Java message Passing System (JMPS) using a coordianted checkpoining algorithm and an efficient recovery algorithm, which is based on causal message logging for improving asynchrony during recovery. Currently, as the technologies of processors and communication networks have rapidly been developed, large-scale distributed systems, consisting of heterogeneous networked computers. provide high-perofrmance parallel computing environments. But as they scale up, their failure rate may also be higher. Therefore, they require the techniques to support low-overhead fault-tolerance. In this paper, the coordinated checkpointing algorithm and the efficient recovery algorithm are used for the requirement in distributed systems. Also as large-scale distributed systems consist of heterogeneous computers, the software package for providing their fault-tolerance must be platform-independent. Thus, we implement the package in Java, which is a hgihly portable programming language.Our performance evaluation results show that the software package implemented in this paper is performed at low-cost.
机译:在本文中,我们使用协调的检查点算法和有效的恢复算法(基于因果消息日志记录来改进异步性),实现了一种软件包,以支持Java消息传递系统(JMPS)上的Java并行应用程序的低开销容错在恢复期间。当前,随着处理器和通信网络技术的迅速发展,由异构网络计算机组成的大规模分布式系统。提供高性能的并行计算环境。但是随着规模的扩大,其故障率也可能更高。因此,他们需要支持低开销容错的技术。本文将协调检查点算法和高效恢复算法用于分布式系统中的需求。同样,由于大型分布式系统由异构计算机组成,因此提供容错功能的软件包必须与平台无关。因此,我们使用一种可移植的编程语言Java来实现该软件包。我们的性能评估结果表明,本文实现的软件包是低成本的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号