首页> 中文期刊> 《计算技术与自动化》 >一种基于HDFS小文件存储优化方案

一种基于HDFS小文件存储优化方案

         

摘要

Hadoop分布式文件系统(HDFS)在大数据存储中具有优良的性能,适用于处理和存储大文件,但在海量小文件处理时性能显著下降,过多的小文件使得整个系统内存消耗过大.为了提高HDFS处理小文件的效率,改进了HDFS的存储方案,提出了海量小文件的存储优化方案.根据小文件之间的相关性进行分类,然后将同一类小文件合并上传,并生成索引文件,读取时采用客户端缓存机制以提高访问效率.实验结果表明,该方案在数据迅速增长的情况下能有效提高小文件访问效率,降低系统内存开销,提高HDFS处理海量小文件的性能.%The Hadoop distributed file system (HDFS) has excellent performance in the big data storage and is suitable for processing and storing big files ,but when processing the mass small files the performance reduced significantly ,too many small files consume excessive amount of memory .In order to improve the efficiency of processing small files in HDFS ,this paper improved the HDFS storage solution ,and proposed an optimization scheme .First ,it Classified the small files according to the correlation ,a set of correlated files is combined into a large file then stored in HDFS ,and generate the index file ,using client-side caching mechanism to improve the efficiency of access .The experimental results show that the proposed scheme can improve the store and access efficiency effectively with rapiding growth of small files ,and reduce memory consumption , improve the performance of processing mass small files .

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号