首页> 中文期刊> 《计算机系统应用》 >基于HDFS的小文件存储与读取优化策略

基于HDFS的小文件存储与读取优化策略

     

摘要

In this paper, the HDFS distributed file system is conducted in-depth research. In HDFS the way of streaming to read and write large files is very efficient, but the efficiency on reading and writing of the mass of small files is relatively low. According to this problem this paper presents a small files based on relational database consolidation strategy. Firstly creating a user’s file for each user, then uploading file’s metadata information to relational database and the file is written to the user’s file when user uploads small files. Finally user via streaming mode to read small files according to the metadata information. When user reads file which size is smaller than the file block, datanode takes load balancing strategy, the datanode of storing data transfers data directly so as to reduce the pressure of the main server and improve the efficiency of file’s transfer. The experimental results show that this scheme solves the shortcoming of HDFS reading and writing small files, improves the HDFS file system of reading and writing performance on massive small files. This scheme can apply to massive small files on cloud storage system, and reduce memory consumption of NameNode to improve the efficiency of file’s reading and writing.%本文对HDFS分布式文件系统进行了深入的研究,在HDFS中以流式的方式访问大文件时效率很高但是对海量小文件的存取效率比较低。本文针对这个问题提出了一个基于关系数据库的小文件合并策略,首先为每个用户建立一个用户文件,其次当用户上传小文件时把文件的元数据信息存入到关系数据库中并将文件追加写入到用户文件中,最后用户读取小文件时通过元数据信息直接以流式方式进行读取。此外当用户读取小于一个文件块大小的文件时还采取了数据节点负载均衡策略,直接由存储数据的DataNode传送给客户端从而减轻主服务器压力提高文件传送效率。实验结果表明通过此方案很好地解决了 HDFS 对大量小文件存取支持不足的缺点,提高了 HDFS 文件系统对海量小文件的读写性能,此方案适用于具有海量小文件的云存储系统,可以降低NameNode内存消耗提高文件读写效率。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号