【24h】

A Strategy to Deal with Mass Small Files in HDFS

机译:HDFS中处理小批量文件的策略

获取原文

摘要

HDFS performs badly in storing and managing a great number of small files as a result of the great memory occupation of the single Namenode and massive seeks and hopping from datanode to datanode. Traditional solutions are only efficient for specific file size or file format. In this paper, we evaluate the performance of some different solutions such as Hbase and Avro. Then in order to compensate for the lack of their inefficiency for middle size small file, we implement a merging and prefetching mechanism. Finally for the purpose of reducing the influence of different file size distributions, we present a strategy of using different schemes for small files of different sizes. Through the experiments of performance comparison, it can be demonstrated that the strategy can improve the original HDFS's writing and reading performance by about 70%.
机译:由于单个Namenode占用大量内存以及从datanode到datanode的大量查找和跳转,HDFS在存储和管理大量小文件方面表现不佳。传统解决方案仅对特定的文件大小或文件格式有效。在本文中,我们评估了一些不同解决方案的性能,例如Hbase和Avro。然后,为了弥补中型小文件效率低下的不足,我们实现了合并和预取机制。最后,为了减少不同文件大小分布的影响,我们提出了对不同大小的小文件使用不同方案的策略。通过性能比较实验,可以证明该策略可以将原始HDFS的读写性能提高约70%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号