首页> 外文期刊>Journal of supercomputing >Enhancing HDFS with a full-text search system for massive small files
【24h】

Enhancing HDFS with a full-text search system for massive small files

机译:使用全文搜索系统增强HDF,用于大量小文件

获取原文
获取原文并翻译 | 示例
           

摘要

HDFS is a popular open-source system for scalable and reliable file management, which is designed as a general-purpose solution for distributed file storage. While it works well for medium or large files, it will suffer heavy performance degradations in case of lots of small files. To overcome this drawback, we propose here a system to enhance HDFS with a distributed true full-text search system SAES of 100% recall and precision ratios. By indexing the meta data of each file, e.g., name, size, date and description, files can be quickly accessed by efficient searches over metadata. Moreover, by merging many small files into a large file to be stored with better space and I/O efficiencies, the negative performance impacts caused by directly storing each small file individually are avoided. An experimental study is conducted for function and performance tests on both realistic and artificial data. The experimental results show that the system works well for file operations such as uploading, downloading and deleting. Moreover, the RAM consumption for managing massive small files is dramatically reduced, which is critical for good system performance. The proposed system could be a potential storage solution for massive small files.
机译:HDFS是一个流行的开源系统,可用于可扩展且可靠的文件管理,该系统被设计为分布式文件存储的通用解决方案。虽然它适用于中型或大文件,但在许多小文件的情况下,它将遭受繁重的性能下降。为了克服这一缺点,我们在此提出了一个系统,以增强HDF,其中包含100%召回和精确比率的分布式真正的全文搜索系统SAE。通过索引每个文件的元数据,例如,名称,大小,日期和描述,可以通过高效搜索元数据来快速访问文件。此外,通过将许多小文件与更好的空间和I / O效率合并到要存储的大文件中,避免了通过单独存储每个小文件引起的负性能影响。对现实和人工数据进行功能和性能测试进行了实验研究。实验结果表明,该系统适用于上传,下载和删除等文件操作。此外,用于管理大规模小文件的RAM消耗大大减少,这对于良好的系统性能至关重要。建议的系统可以是大量小文件的潜在存储解决方案。

著录项

  • 来源
    《Journal of supercomputing》 |2021年第7期|7149-7170|共22页
  • 作者单位

    Sun Yat Sen Univ Sch Comp Sci & Engn Guangzhou Peoples R China;

    Sun Yat Sen Univ Sch Comp Sci & Engn Guangzhou Peoples R China;

    Guangdong Univ Foreign Studies Sch Informat Sci & Technol Guangzhou Peoples R China;

    Sun Yat Sen Univ Sch Comp Sci & Engn Guangzhou Peoples R China|Guangdong Prov Key Lab Informat Secur Technol Guangzhou Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Lots of small files; HDFS; Elasticsearch; Full-text search;

    机译:许多小文件;HDFS;弹性科学;全文搜索;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号