首页> 外文会议>Latin-American Symposium on Dependable Computing >Improving Data Availability in HDFS through Replica Balancing
【24h】

Improving Data Availability in HDFS through Replica Balancing

机译:通过副本平衡提高HDFS中的数据可用性

获取原文

摘要

Over time, the data distribution across an HDFS cluster may become unbalanced. The HDFS Balancer is a tool provided by Apache Hadoop that redistributes blocks by moving them from nodes with higher utilization to nodes with lower utilization. However, during block rearrangement, the HDFS Balancer does not aim to increase the availability of the data. This work presents a strategy that gives priority to block movements which increase the overall availability of the data stored in the HDFS. Thereby, increasing the fault tolerance as placing blocks in a higher number of racks tends to reduce the chances of data loss. In order to evaluate the implementation, an experimental investigation has been conducted to measure the system performance after balancing the cluster with the proposed solution.
机译:随着时间的流逝,整个HDFS群集上的数据分布可能会变得不平衡。 HDFS Balancer是Apache Hadoop提供的工具,可通过将块从利用率较高的节点移动到利用率较低的节点来重新分配块。但是,在块重新排列期间,HDFS Balancer并非旨在提高数据的可用性。这项工作提出了一种策略,该策略优先考虑块移动,这增加了HDFS中存储的数据的整体可用性。从而,随着将块放置在更多数量的机架中而增加了容错能力,倾向于减少数据丢失的机会。为了评估实现,已进行了实验研究,以在将群集与提出的解决方案平衡后测量系统性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号