A hierarchical fault detection and recovery in a computational grid using watchdog timers

机译：使用看门狗定时器的计算网格中的分层故障检测和恢复

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Grid computing basically means applying the resources of individual computers in a network to focus on a single problem/task at the same time. But the disadvantage of this feature is that the computers which are actually performing the calculations might not be always trustworthy and may fail periodically. Hence larger the number of nodes in the grid, greater is the probability that a node fails. Hence in order to execute the workflows in a fault tolerant manner we go for fault tolerance and recovery strategies. This paper proposes a method in which the instantaneous snapshot of the local state of processes within each node is recorded. An efficient algorithm is introduced for the detection of the node failures using watch dog timers. For recovery we make use of divide and conquer algorithm that avoids redoing of already completed jobs, enabling faster recovery.

机译：网格计算基本上是指将网络中各个计算机的资源用于同时关注单个问题/任务。但是此功能的缺点是，实际上正在执行计算的计算机可能并不总是可信赖的，并且可能会定期出现故障。因此，网格中的节点数越多，节点发生故障的可能性就越大。因此，为了以容错方式执行工作流，我们采用了容错和恢复策略。本文提出了一种方法，其中记录每个节点内的进程的本地状态的瞬时快照。引入了一种有效的算法，使用看门狗定时器来检测节点故障。为了进行恢复，我们使用了分而治之算法，该算法避免重做已完成的作业，从而实现更快的恢复。

著录项

来源
《Proceedings of 2010 International Conference on Communication and Computational Intelligence》|2010年|p.467-471|共5页
会议地点
作者
Bhagyashree A.H.; Pradeep D.; Jayanthy N.; Mounica K.V.; Nivejaa S.; Dharani P.S.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类理论、方法;通信;
关键词
Grid Computing; cluster; fault; load balancing; watch dog timer;

机译：网格计算;集群;故障;负载均衡;看门狗定时器;

相似文献

外文文献
中文文献
专利

1. Real-Time Hierarchical Neural Network Based Fault Detection and Isolation for High-Speed Railway System Under Hybrid AC/DC Grid [J] . Qin Liu, Tian Liang, Venkata Dinavahi Power Delivery, IEEE Transactions on . 2020,第6期

机译：混合AC / DC网格下高速铁路系统的实时分层神经网络的故障检测与隔离
2. Fault Detection, Identification, and Location in Smart Grid Based on Data-Driven Computational Methods [J] . Jiang H., Zhang J.J., Gao W., Smart Grid, IEEE Transactions on . 2014,第6期

机译：基于数据驱动计算方法的智能电网故障检测，识别与定位
3. FDR: fault detection and recovery scheme for wireless sensor networks using virtual grid [J] . Kulwardhan Singh, T. P. Sharma Parallel Algorithms and Applications . 2017,第5a6期

机译：FDR：使用虚拟网格的无线传感器网络的故障检测和恢复方案
4. A hierarchical fault detection and recovery in a computational grid using watchdog timers [C] . Bhagyashree A.H., Pradeep D., Jayanthy N., International Conference on Communication and Computational Intelligence . 2010

机译：使用看门狗定时器在计算网格中的分层故障检测和恢复
5. Robust integration of multi-level fault detection mechanisms and recovery mechanisms in a component-based support middleware model for fault-tolerant real-time distributed computing. [D] . Zhou, Qian. 2009

机译：多级故障检测机制和恢复机制在基于组件的支持中间件模型中的可靠集成，用于容错实时分布式计算。
6. Fault tolerance in computational grids: perspectives challenges and issues [O] . Sajjad Haider, Babar Nazir -1

机译：计算网格中的容错能力：观点挑战和问题
7. Fault Tolerance and Recovery of Scientific Workflows on Computational Grids [O] . Gopi Kandaswamy, Anirban Mandal, Daniel A. Reed 2013

机译：容错和计算网格上科学工作流的恢复
8. Hierarchical Multiscale Particle Computational Method for Simulation of Nanoscale Flows on 3D Unstructured Grids [R] . Gatsonis, N. A. 2009

机译：三维非结构网格纳米级流动模拟的分层多尺度粒子计算方法

A hierarchical fault detection and recovery in a computational grid using watchdog timers

摘要

著录项

相似文献

相关主题

期刊订阅