【24h】

A novel approach for efficient accessing of small files in HDFS: TLB-MapFile

机译:一种有效访问HDFS中小文件的新颖方法:TLB-MapFile

获取原文

摘要

Hadoop distributed file system (HDFS) was originally designed for streaming access large files, but the access and storage efficiency is low for the mass small files. This paper presents an access optimization approach for HDFS small file based on MapFile: TLB-MapFile. TLB-MapFile merges massive small files into large files by MapFile mechanism to reduce NameNode memory consumption and add fast table structure (TLB) in DataNode, and to improve retrieval efficiency of small files. First, according to MapFile mechanism, small files are merged into large files and stored in HDFS. Second, the access frequency and the ordered queue of small files (per unit time) can be obtained through accessing system audit logs in HDFS, and the mapping information between block and small files are stored in the TLB table with regularly being updated. TLB-MapFile improves access efficiency of small files through the prefetching of priori strategies based on TLB table. Experiment results show that this method can effectively reduce NameNode memory consumption and improve the reading speed of small files.
机译:Hadoop分布式文件系统(HDFS)最初是为流式访问大型文件而设计的,但是对于大容量的小文件而言,访问和存储效率较低。本文提出了一种基于MapFile的HDFS小文件访问优化方法:TLB-MapFile。 TLB-MapFile通过MapFile机制将大量的小文件合并为大文件,以减少NameNode的内存消耗并在DataNode中添加快速表结构(TLB),并提高小文件的检索效率。首先,根据MapFile机制,将小文件合并为大文件并存储在HDFS中。其次,通过访问HDFS中的系统审核日志可以获取小文件的访问频率和排序队列(每单位时间),并且块和小文件之间的映射信息存储在TLB表中并定期更新。 TLB-MapFile通过预取基于TLB表的先验策略来提高小文件的访问效率。实验结果表明,该方法可以有效减少NameNode的内存消耗,提高小文件的读取速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号