首页> 外文期刊>International Journal of Business Intelligence and Data Mining >Master node fault tolerance in distributed big data processing clusters
【24h】

Master node fault tolerance in distributed big data processing clusters

机译:分布式大数据处理集群中的主节点容错能力

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Distributed computing clusters are often built with commodity hardware which leads to periodic failures of processing nodes due to relatively low reliability of such hardware. While worker node fault-tolerance is straightforward, fault tolerance of master node poses a bigger challenge. In this paper master node failure handling is based on the concept of master and worker roles that can be dynamically re-assigned to cluster nodes along with maintaining a backup of the master node state on one of worker nodes. In such case no special component is needed to monitor the health of the cluster while master node failures can be resolved except for the cases of simultaneous failure of master and backup. We present experimental evaluation of the technique implementation, show benchmarks demonstrating that a failure of a master does not affect running job, and a failure of backup results in re-computation of only the last job step.
机译:分布式计算集群通常用商品硬件构建,由于此类硬件的可靠性较低,因此会导致处理节点的周期性故障。尽管工作节点的容错性很简单,但主节点的容错性却提出了更大的挑战。在本文中,主节点故障处理基于主角色和辅助角色的概念,这些角色可以动态地重新分配给群集节点,同时维护主节点状态在一个辅助节点上的备份。在这种情况下,除了可以同时解决主节点和备份失败的情况外,不需要特殊组件来监视集群的运行状况,而可以解决主节点故障。我们提供了对该技术实现的实验评估,显示了一些基准,这些基准表明主服务器的故障不会影响正在运行的作业,而备份的故障只会导致对最后一个作业步骤进行重新计算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号