首页> 外文会议>International Conference on Advanced Computing >Efficient prefetching technique for storage of heterogeneous small files in Hadoop Distributed File System Federation

【24h】

Efficient prefetching technique for storage of heterogeneous small files in Hadoop Distributed File System Federation

机译：Hadoop分布式文件系统联合中存储异构小文件的高效预取技术

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Hadoop Distributed File System Federation [5] is used to store and manage large files. This has been used in a university scenario to store various categories of files such as PDFs, audio, video, presentation and image files. However, HDFS Federation suffers performance penalty while storing a large number of small files. Also, scaling the namenodes in HDFS Federation does not solve the small files problem [7] but only delays the metadata accumulation. One approach to handle this problem was implemented in BlueSky [1], one of the most revalent e-learning resources in China. However, this system does not handle files from heterogeneous users and the prefetching mechanism implemented in this system takes into account only the locality of reference and does not consider file access patterns. The objective of this paper is to address the above mentioned shortcomings by developing an efficient approach to handle files from heterogeneous users and to devise an efficient prefetching algorithm based on file access patterns. The file access patterns are stored and updated in a priority heap. Heterogeneous users can upload their files and complete transparency is maintained in grouping small files into a large file. This approach of merging several small files into a large file reduces the memory footprint in Federated HDFS. In addition to the existing features, this paper also provides options to modify and delete the files stored by users in Federated HDFS. Performance of original HDFS Federation and the proposed system are benchmarked with a set of 100,000 small files. The experimental results show that the memory usage was reduced by 36% from original HDFS Federation. File read time has been brought down by 94% (with prefetching based on files access patterns) compared to the proposed system without prefetching and 92% compared to prefetching based on the locality of reference.

机译：Hadoop分布式文件系统联合[5]用于存储和管理大文件。这已用于大学方案，用于存储各类文件，如PDF，音频，视频，演示文稿和图像文件。但是，HDFS联邦在存储大量小文件的同时遭受性能惩罚。此外，在HDFS联合中缩放NameNode不解决小文件问题[7]，但仅延迟元数据累积。一个处理这个问题的一种方法是在Bluesky [1]中实施了中国中最重复的电子学习资源之一。但是，该系统不处理来自异构用户的文件，并且在该系统中实现的预取机制仅考虑了参考的局部性，并且不考虑文件访问模式。本文的目的是通过开发来自异构用户的有效方法来解决上述缺点，并根据文件访问模式设计高效预取算法。在优先级堆中存储和更新文件访问模式。异构用户可以上传他们的文件，并在将小文件分组到大文件中保持完整的透明度。将几个小文件合并到大文件中的这种方法减少了联合HDFS中的内存占用。除现有功能外，本文还提供了修改和删除用户在联合HDF中存储的文件的选项。原始HDFS联合的性能和所提出的系统采用一组100,000个小文件为基准。实验结果表明，从原始HDFS联合会，内存使用量减少了36％。与所提出的系统相比，文件读取时间已达到94％（基于文件访问模式的预取），而在没有预取的系统，与基于参考文献的局部预取相比为92％。

著录项

来源
《International Conference on Advanced Computing》|2013年||共8页
会议地点
作者
Aishwarya K; Arvind Ram A; Sreevatson M C; Babu Chitra; Prabavathy B;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Delays; Educationalinstitutions; Heartbeat; Indexes; Lead; Merging; Prefetching; HDFSFederation; filesaccesspattern; metadata; prefetching; smallfilesproblem;

机译：延迟;Educationalinstitutions;心跳;指标;铅;合并;预取;HDFSFederation;filesaccesspattern;元数据预取;smallfilesproblem;

相似文献

外文文献
中文文献
专利

1. Optimizing Read Operations of Hadoop Distributed File System on Heterogeneous Storages [J] . Lee Jongbaeg, Lee Jongwuk, Lee Sang-Won Journal of information science and engineering . 2021,第3期

机译：优化异构存储上Hadoop分布式文件系统的读取操作
2. The Design and Implementation of Appointed File Prefetching for Distributed File Systems [J] . Gwan-Hwan Hwang, Hsin-Fu Lin, Chun-Chin Sy, Journal of research and practice in information technology . 2008,第2期

机译：分布式文件系统指定文件预取的设计与实现
3. Dynamic Merging based Small File Storage (DM-SFS) Architecture for Efficiently Storing Small Size Files in Hadoop [J] . Mohd Abdul Ahad, Ranjit Biswas Procedia Computer Science . 2018,第1期

机译：基于动态合并的小文件存储（DM-SFS）架构，可在Hadoop中有效存储小文件
4. Efficient prefetching technique for storage of heterogeneous small files in Hadoop Distributed File System Federation [C] . Aishwarya K, Arvind Ram A, Sreevatson M C, International Conference on Advanced Computing . 2013

机译：Hadoop分布式文件系统联合中用于存储异构小文件的高效预取技术
5. Enabling Efficient and Dependable Clustered File Systems through New Erasure Coding Techniques. [D] . Li, Runhui. 2015

机译：通过新的擦除编码技术实现高效且可靠的群集文件系统。
6. A comparative dosimetric study for treating left-sided breast cancer for small breast size using five different radiotherapy techniques: conventional tangential field filed-in-filed Tangential-IMRT Multi-beam IMRT and VMAT [O] . Guang-Hua Jin, Li-Xin Chen, Xiao-Wu Deng, 2013

机译：使用五种不同的放射治疗技术对较小乳腺左侧乳腺癌进行剂量学比较研究：常规切线场场中切线切线IMRT多束IMRT和VMAT
7. Energy-efficient file placement techniques for heterogeneous mobile storage systems [O] . Young-jin Kim, Kwon-taek Kwon, Jihong Kim 2012

机译：用于异构移动存储系统的节能文件放置技术

Efficient prefetching technique for storage of heterogeneous small files in Hadoop Distributed File System Federation

摘要

著录项

相似文献

相关主题

期刊订阅