A Strategy to Deal with Mass Small Files in HDFS

机译：HDFS中处理小批量文件的策略

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

HDFS performs badly in storing and managing a great number of small files as a result of the great memory occupation of the single Namenode and massive seeks and hopping from datanode to datanode. Traditional solutions are only efficient for specific file size or file format. In this paper, we evaluate the performance of some different solutions such as Hbase and Avro. Then in order to compensate for the lack of their inefficiency for middle size small file, we implement a merging and prefetching mechanism. Finally for the purpose of reducing the influence of different file size distributions, we present a strategy of using different schemes for small files of different sizes. Through the experiments of performance comparison, it can be demonstrated that the strategy can improve the original HDFS's writing and reading performance by about 70%.

机译：由于单个Namenode占用大量内存以及从datanode到datanode的大量查找和跳转，HDFS在存储和管理大量小文件方面表现不佳。传统解决方案仅对特定的文件大小或文件格式有效。在本文中，我们评估了一些不同解决方案的性能，例如Hbase和Avro。然后，为了弥补中型小文件效率低下的不足，我们实现了合并和预取机制。最后，为了减少不同文件大小分布的影响，我们提出了对不同大小的小文件使用不同方案的策略。通过性能比较实验，可以证明该策略可以将原始HDFS的读写性能提高约70％。

著录项

来源
《International Conference on Intelligent Human-Machine Systems and Cybernetics》|2014年|331-334|共4页
会议地点
作者
Zhang Shuo; Miao Li; Zhang Dafang; Wang Yuli;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
File systems; Indexes; Information services; Memory management; Merging; Prefetching; Writing; Avro; HBASE; HDFS; small files; strategy;

机译：文件系统;索引;信息服务;内存管理;合并;预取;写作; Avro; HBASE; HDFS;小文件;战略;

相似文献

外文文献
中文文献
专利

1. Enhancing HDFS with a full-text search system for massive small files [J] . Xu Wentao, Zhao Xin, Lao Bin, Journal of supercomputing . 2021,第7期

机译：使用全文搜索系统增强HDF，用于大量小文件
2. Pseudo-Cache-Based IoT Small Files Management Framework in HDFS Cluster [J] . Siddiqui Isma Farah, Qureshi Nawab Muhammad Faseeh, Chowdhry Bhawani Shankar, Wireless personal communications: An Internaional Journal . 2020,第3期

机译：基于伪缓存的IOT小文件HDFS集群中的小文件管理框架
3. An optimized method of HDFS for massive small files storage [J] . Jing Weipeng, Tong Danyu, Chen GuangSheng, Computer Science and Information Systems . 2018,第3期

机译：HDFS用于海量小文件存储的优化方法
4. A Strategy to Deal with Mass Small Files in HDFS [C] . Zhang Shuo, Miao Li, Zhang Dafang, International Conference on Intelligent Human-Machine Systems and Cybernetics . 2014

机译：处理HDFS中的大规模小文件的策略
5. P2PHDFS: An implementation of Statistic Multiplexed Computing Architecture in Hadoop File System. [D] . Pradeep, Aakash. 2012

机译：P2PHDFS：Hadoop文件系统中统计复用计算体系结构的实现。
6. Experimental Directory Structure (Exdir): An Alternative to HDF5 Without Introducing a New File Format [O] . Svenn-Arne Dragly, Milad Hobbi Mobarhan, Mikkel E. Lepperød, 2018

机译：实验性目录结构（Exdir）：HDF5的替代方案无需引入新的文件格式
7. An optimization strategy of massive small files storage based on HDFS [O] . Xun Cai, Cai Chen, Yi Liang 2018

机译：基于HDFS的大规模小文件存储优化策略

A Strategy to Deal with Mass Small Files in HDFS

摘要

著录项

相似文献

相关主题

期刊订阅