Master node fault tolerance in distributed big data processing clusters

Ivan Gankevich; Yuri Tipikin; Vladimir Korkhov; Vladimir Gaiduchok; Alexander Degtyarev; A. Bogdanov

首页> 外文期刊>International Journal of Business Intelligence and Data Mining >Master node fault tolerance in distributed big data processing clusters

【24h】

Master node fault tolerance in distributed big data processing clusters

机译：分布式大数据处理集群中的主节点容错能力

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Distributed computing clusters are often built with commodity hardware which leads to periodic failures of processing nodes due to relatively low reliability of such hardware. While worker node fault-tolerance is straightforward, fault tolerance of master node poses a bigger challenge. In this paper master node failure handling is based on the concept of master and worker roles that can be dynamically re-assigned to cluster nodes along with maintaining a backup of the master node state on one of worker nodes. In such case no special component is needed to monitor the health of the cluster while master node failures can be resolved except for the cases of simultaneous failure of master and backup. We present experimental evaluation of the technique implementation, show benchmarks demonstrating that a failure of a master does not affect running job, and a failure of backup results in re-computation of only the last job step.

机译：分布式计算集群通常用商品硬件构建，由于此类硬件的可靠性较低，因此会导致处理节点的周期性故障。尽管工作节点的容错性很简单，但主节点的容错性却提出了更大的挑战。在本文中，主节点故障处理基于主角色和辅助角色的概念，这些角色可以动态地重新分配给群集节点，同时维护主节点状态在一个辅助节点上的备份。在这种情况下，除了可以同时解决主节点和备份失败的情况外，不需要特殊组件来监视集群的运行状况，而可以解决主节点故障。我们提供了对该技术实现的实验评估，显示了一些基准，这些基准表明主服务器的故障不会影响正在运行的作业，而备份的故障只会导致对最后一个作业步骤进行重新计算。

著录项

来源
《International Journal of Business Intelligence and Data Mining》 |2019年第2期|158-172|共15页
作者
Ivan Gankevich; Yuri Tipikin; Vladimir Korkhov; Vladimir Gaiduchok; Alexander Degtyarev; A. Bogdanov;
展开▼
作者单位

Dept. of Computer Modelling and Multiprocessor Systems Saint Petersburg State University;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
parallel computing; big data processing; distributed computing; backup node; state transfer; delegation; cluster computing; fault-tolerance; high-availability; hierarchy;

机译：并行计算;大数据处理;分布式计算备用节点;状态转移;代表团;集群计算;容错高可用性;等级制度;

相似文献

外文文献
中文文献
专利

1. On Fault Tolerance for Distributed Iterative Dataflow Processing [J] . Chen Xu, Markus Holzemer, Manohar Kaul, IEEE Transactions on Knowledge and Data Engineering . 2017,第8期

机译：分布式迭代数据流处理的容错性
2. A multi-factor monitoring fault tolerance model based on a GPU cluster for big data processing [J] . Fang Yuling, Chen Qingkui, Xiong Naixue Information Sciences: An International Journal . 2019,第期

机译：基于GPU集群的大数据处理多因素监测容错模型
3. Fault-tolerant technology for big data cluster in distributed flow processing system [J] . Jia Zhicheng Web Intelligence . 2020,第2期

机译：分布式流量处理系统中大数据集群的容错技术
4. Efficient fault-tolerance for iterative graph processing on distributed dataflow systems [C] . Chen Xu, Markus Holzemer, Manohar Kaul, IEEE International Conference on Data Engineering . 2016

机译：分布式数据流系统上迭代图处理的高效容错
5. Exploiting Asynchrony for Performance and Fault Tolerance in Distributed Graph Processing [D] . Vora, Keval Dinesh. 2017

机译：在分布图处理中利用异步实现性能和容错
6. A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring [O] . Adeyinka Akanbi, Muthoni Masinde 2020

机译：大数据平台上异构数据实时分析的分布式流处理中间件框架：环境监测案例
7. An Object-Oriented View of Fragmented Data Processing for Fault and Intrusion Tolerance in Distributed Systems [O] . Jean-charles Fabre, Brian Randell 1992

机译：分布式系统中容错和碎片容忍的碎片数据处理的面向对象视图

Master node fault tolerance in distributed big data processing clusters

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅