Hadoop作为成熟的分布式云平台,对较大的文件提供了可靠高效的存储服务,但在处理海量小文件时效率显著降低。该文提出了基于Hadoop的海量教育资源小文件的存储优化方案,利用教育资源小文件间的关联关系,将小文件进行合并成大文件以减少文件数量,并索引机制访问小文件、元数据缓存和关联小文件预取机制来提高文件的读取效率。实验结果表明,该方法提高了Hadoop文件系统存储小文件的存取效率。%Hadoop distributes file system (HDFS) can process large amounts of data effectively through large clusters. However, HDFS is designed to handle large files and suffers performance penalty while dealing with large number of small files. An approach based on HDFS is proposed to improve storage efficiency of small files in HDFS. The main idea is to classify the mass small files, merge them by classes, and index the merged files aiming at reducing the amount of index items in namenodes and improving the storage efficiency. Experimental results show that the storage efficiency of small files is improved contrasting to Hadoop Archives (HAR files).
展开▼