首页> 外文会议>2018 IEEE International Congress on Big Data >Towards a Better Replica Management for Hadoop Distributed File System
【24h】

Towards a Better Replica Management for Hadoop Distributed File System

机译:寻求更好的Hadoop分布式文件系统副本管理

获取原文
获取原文并翻译 | 示例

摘要

The Hadoop Distributed File System (HDFS) is the storage of choice when it comes to large-scale distributed systems. In addition to being efficient and scalable, HDFS provides high throughput and reliability through the replication of data. Recent work exploits this replication feature by dynamically varying the replication factor of in-demand data as a means of increasing data locality and achieving a performance improvement. However, to the best of our knowledge, no study has been performed on the consequences of varying the replication factor. In particular, our work is the first to show that although HDFS deals well with increasing the replication factor, it experiences problems with decreasing it. This leads to unbalanced data, hot spots, and performance degradation. In order to address this problem, we propose a new workload-aware balanced replica deletion algorithm. We also show that our algorithm successfully maintains the data balance and achieves up to 48% improvement in execution time when compared to HDFS, while only creating an overhead of 1.69% on average.
机译:Hadoop分布式文件系统(HDFS)是大型分布式系统的首选存储。除了高效且可扩展之外,HDFS还通过数据复制提供了高吞吐量和可靠性。最近的工作通过动态改变需求数据的复制因子来利用此复制功能,以增加数据局部性并提高性能。但是,据我们所知,尚未对改变复制因子的后果进行任何研究。特别是,我们的工作首次表明,尽管HDFS在增加复制因子方面做得很好,但在减少复制因子方面却遇到了问题。这导致数据不平衡,热点和性能​​下降。为了解决此问题,我们提出了一种新的可感知工作负载的平衡副本删除算法。我们还表明,与HDFS相比,我们的算法成功地保持了数据平衡,并在执行时间上实现了多达48%的改善,而平均仅产生了1.69%的开销。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号