首页> 中文期刊>计算机应用研究 >一种面向HDFS中海量小文件的存取优化方法

一种面向HDFS中海量小文件的存取优化方法

     

摘要

为了解决HDFS(Hadoop distributed file system)在存储海量小文件时遇到的NameNode内存瓶颈等问题,提高HDFS处理海量小文件的效率,提出一种基于小文件合并与预取的存取优化方案.首先通过分析大量小文件历史访问日志,得到小文件之间的关联关系,然后根据文件相关性将相关联的小文件合并成大文件后再存储到HDFS.从HDFS中读取数据时,根据文件之间的相关性,对接下来用户最有可能访问的文件进行预取,减少了客户端对NameNode节点的访问次数,提高了文件命中率和处理速度.实验结果证明,该方法有效提升了Hadoop对小文件的存取效率,降低了NameNode节点的内存占用率.%In order to solve the problem of NameNode memory bottleneck when HDFS stored a massive amount of small files, this paper proposed an optimization of massive small files storage and accessing on HDFS to improve the efficiency of HDFS.First, it could get the relationship between small files by analyzing a large number of history access logs, and then merged these correlative small files into a big file which would be stored on HDFS.When the client read data from HDFS, the system would prefetch the related files which were most likely to be visited next according to the relevance of small files to reduce the number of request for NameNode, thereby increasing the hit rate and processing speed.The results of experiment show that this method can effectively improve the efficiency of storing and accessing mass small files on HDFS, and cuts down the memory utilization of NameNode.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号