
NCluster: Using Multiple Active Name Nodes to Achieve High Availability for HDFS


获取原文并翻译 | 示例


Hadoop HDFS is an open source project from Apache Software Foundation for scalable, distributed computing and data storage. HDFS has become a critical component in today's cloud computing environment and a wide range of applications built on top of it. However, the initial design of HDFS has introduced a single-point-of-failure, HDFS contains only one active name node, if this name node experiences software or hardware failures, the whole HDFS cluster is unusable until the recovery of name node is finished, this is the reason why people are reluctant to deploy HDFS for an application whose requirement is high availability. In this paper, we present a solution to enable the high availability for HDFS's name node through efficient metadata replication. Our solution has two major advantages than existing ones: we utilize multiple active name nodes, instead of one, to build a cluster to serve request of metadata simultaneously. We implements a pub/sub system to handle the metadata replication process across these active namonodes efficiently. Based on the solution we implement a prototype called NCluster and integrate it with HDFS. We also evaluate NCluster to exhibit its feasibility and effectiveness. The experimental results show that our solution performs well with low replication cost, good throughput and scalability.
机译:Hadoop HDFS是Apache Software Foundation的一个开源项目,用于可扩展的分布式计算和数据存储。 HDFS已成为当今云计算环境和基于此环境构建的各种应用程序中的关键组件。但是,HDFS的初始设计引入了一个单故障点,HDFS仅包含一个活动名称节点,如果该名称节点遇到软件或硬件故障,则整个HDFS群集在名称节点的恢复完成之前将无法使用。 ,这就是为什么人们不愿意为要求高可用性的应用程序部署HDFS的原因。在本文中,我们提出了一种通过有效的元数据复制为HDFS的名称节点实现高可用性的解决方案。与现有解决方案相比,我们的解决方案具有两个主要优点:我们利用多个活动名称节点(而不是一个)来构建一个群集来同时服务元数据请求。我们实现了一个发布/订阅系统,以有效处理这些活动的namonodes上的元数据复制过程。基于该解决方案,我们实现了一个称为NCluster的原型,并将其与HDFS集成。我们还评估了NCluster以展示其可行性和有效性。实验结果表明,我们的解决方案在复制成本低,吞吐量和可伸缩性方面表现良好。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号