首页> 外文会议>IEEE International Conference on Big Data Computing Service and Applications >LHF: A New Archive Based Approach to Accelerate Massive Small Files Access Performance in HDFS
【24h】

LHF: A New Archive Based Approach to Accelerate Massive Small Files Access Performance in HDFS

机译:LHF:一种基于存档的新方法,可加快HDFS中大量小文件的访问性能

获取原文

摘要

As one of the most popular open source projects, Hadoop is considered nowadays as the de-facto framework for managing and analyzing huge amounts of data. HDFS (Hadoop Distributed File System) is one of the core components in Hadoop framework to store big data, especially semi-structured and unstructured data. HDFS provides high scalability and reliability when handling large files across thousands of machines. But the performance will be severely degraded while dealing with massive small files. Although some effort was spent to investigate this well-known issue, existing approaches, such as HAR, SequenceFile, and MapFile, are limited in their ability to reduce the memory consumption of the NameNode and optimize the access performance in the meantime. In this paper, we presented LHF, a solution to handle massive small files in HDFS by merging small files into big files and building a linear hashing based extendable index to speed up the process of locating a small file. The advantages of our approach are (1) it significantly reduces the size of the metadata, (2) it does not require sorting the files at the client side, (3) it supports appending more small files to the merged file afterwards and (4) it achieves good access performance. A series of experiments were performed to demonstrate the effectiveness and efficiency of LHF as well, which takes less time while accessing files compared with other methods.
机译:作为最受欢迎的开源项目之一,Hadoop被视为当今用于管理和分析大量数据的实际框架。 HDFS(Hadoop分布式文件系统)是Hadoop框架中用于存储大数据(尤其是半结构化和非结构化数据)的核心组件之一。当在数千台计算机上处​​理大型文件时,HDFS提供了高可伸缩性和可靠性。但是,在处理大量小文件时,性能将严重下降。尽管花费了一些精力来研究这个众所周知的问题,但是现有的方法(例如HAR,SequenceFile和MapFile)在减少NameNode的内存消耗和优化访问性能的能力上受到限制。在本文中,我们介绍了LHF,该解决方案通过将小文件合并为大文件并构建基于线性哈希的可扩展索引来处理HDFS中的大文件,从而加快了查找小文件的过程。我们方法的优点是(1)它显着减小了元数据的大小;(2)它不需要在客户端对文件进行排序;(3)它支持随后将更多的小文件附加到合并的文件中;(4) ),可实现良好的访问性能。进行了一系列实验以证明LHF的有效性和效率,与其他方法相比,访问文件所需的时间更少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号