首页> 中文期刊>电子科技大学学报 >基于Hadoop的小文件存储优化方案

基于Hadoop的小文件存储优化方案

     

摘要

Hadoop作为成熟的分布式云平台,对较大的文件提供了可靠高效的存储服务,但在处理海量小文件时效率显著降低。该文提出了基于Hadoop的海量教育资源小文件的存储优化方案,利用教育资源小文件间的关联关系,将小文件进行合并成大文件以减少文件数量,并索引机制访问小文件、元数据缓存和关联小文件预取机制来提高文件的读取效率。实验结果表明,该方法提高了Hadoop文件系统存储小文件的存取效率。%Hadoop distributes file system (HDFS) can process large amounts of data effectively through large clusters. However, HDFS is designed to handle large files and suffers performance penalty while dealing with large number of small files. An approach based on HDFS is proposed to improve storage efficiency of small files in HDFS. The main idea is to classify the mass small files, merge them by classes, and index the merged files aiming at reducing the amount of index items in namenodes and improving the storage efficiency. Experimental results show that the storage efficiency of small files is improved contrasting to Hadoop Archives (HAR files).

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号