首页> 外文会议>International Conference on Advanced Computing >Efficient prefetching technique for storage of heterogeneous small files in Hadoop Distributed File System Federation
【24h】

Efficient prefetching technique for storage of heterogeneous small files in Hadoop Distributed File System Federation

机译:Hadoop分布式文件系统联合中存储异构小文件的高效预取技术

获取原文

摘要

Hadoop Distributed File System Federation [5] is used to store and manage large files. This has been used in a university scenario to store various categories of files such as PDFs, audio, video, presentation and image files. However, HDFS Federation suffers performance penalty while storing a large number of small files. Also, scaling the namenodes in HDFS Federation does not solve the small files problem [7] but only delays the metadata accumulation. One approach to handle this problem was implemented in BlueSky [1], one of the most revalent e-learning resources in China. However, this system does not handle files from heterogeneous users and the prefetching mechanism implemented in this system takes into account only the locality of reference and does not consider file access patterns. The objective of this paper is to address the above mentioned shortcomings by developing an efficient approach to handle files from heterogeneous users and to devise an efficient prefetching algorithm based on file access patterns. The file access patterns are stored and updated in a priority heap. Heterogeneous users can upload their files and complete transparency is maintained in grouping small files into a large file. This approach of merging several small files into a large file reduces the memory footprint in Federated HDFS. In addition to the existing features, this paper also provides options to modify and delete the files stored by users in Federated HDFS. Performance of original HDFS Federation and the proposed system are benchmarked with a set of 100,000 small files. The experimental results show that the memory usage was reduced by 36% from original HDFS Federation. File read time has been brought down by 94% (with prefetching based on files access patterns) compared to the proposed system without prefetching and 92% compared to prefetching based on the locality of reference.
机译:Hadoop分布式文件系统联合[5]用于存储和管理大文件。这已用于大学方案,用于存储各类文件,如PDF,音频,视频,演示文稿和图像文件。但是,HDFS联邦在存储大量小文件的同时遭受性能惩罚。此外,在HDFS联合中缩放NameNode不解决小文件问题[7],但仅延迟元数据累积。一个处理这个问题的一种方法是在Bluesky [1]中实施了中国中最重复的电子学习资源之一。但是,该系统不处理来自异构用户的文件,并且在该系统中实现的预取机制仅考虑了参考的局部性,并且不考虑文件访问模式。本文的目的是通过开发来自异构用户的有效方法来解决上述缺点,并根据文件访问模式设计高效预取算法。在优先级堆中存储和更新文件访问模式。异构用户可以上传他们的文件,并在将小文件分组到大文件中保持完整的透明度。将几个小文件合并到大文件中的这种方法减少了联合HDFS中的内存占用。除现有功能外,本文还提供了修改和删除用户在联合HDF中存储的文件的选项。原始HDFS联合的性能和所提出的系统采用一组100,000个小文件为基准。实验结果表明,从原始HDFS联合会,内存使用量减少了36%。与所提出的系统相比,文件读取时间已达到94%(基于文件访问模式的预取),而在没有预取的系统,与基于参考文献的局部预取相比为92%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号