首页> 外文会议>International Conference on Computer Communication and Informatics >HAR+: Archive and metadata distribution! Why not both?
【24h】

HAR+: Archive and metadata distribution! Why not both?

机译:HAR +:存档和元数据分发!为什么不兼得?

获取原文

摘要

Size of the data used in today's enterprises has been expanding at a huge range from last few years. Simultaneously, the need to process and analyze the large volumes of data has also increased. Hadoop Distributed File System (HDFS), is an open source implementation of Apache, designed for running on commodity hardware to handle applications having large datasets (TB, PB). HDFS architecture is based on single master (Name Node), which handles the metadata for large number of slaves. To get maximum efficiency, Name Node stores all of the metadata in its RAM. So, when dealing with huge number of small files, Name Node often becomes a bottleneck for HDFS as it might run out of memory. Apache Hadoop uses Hadoop ARchive (HAR) to deal with small files. But it is not so efficient for multi-NameNode environment, which requires automatic scaling of metadata. In this paper, we have designed hashtable based architecture, Hadoop ARchive Plus (HAR+) using sha256 as the key, which is a modification of existing HAR. HAR+ is designed to provide more reliability which can also provide auto scaling of metadata. Instead of using one NameNode for storing the metadata, HAR+ uses multiple NameNodes. Our result shows that HAR+ reduces the load of a single NameNode in significant amount. This makes the cluster more scalable, more robust and less prone to failure unlike of Hadoop ARchive.
机译:从最近几年开始,当今企业中使用的数据大小一直在巨大范围内扩展。同时,处理和分析大量数据的需求也增加了。 Hadoop分布式文件系统(HDFS)是Apache的开源实现,旨在在商品硬件上运行以处理具有大型数据集(TB,PB)的应用程序。 HDFS体系结构基于单个主节点(名称节点),该主节点处理大量从节点的元数据。为了获得最大效率,名称节点将所有元数据存储在其RAM中。因此,在处理大量小文件时,名称节点通常会成为HDFS的瓶颈,因为它可能会耗尽内存。 Apache Hadoop使用Hadoop ARchive(HAR)处理小文件。但是对于需要自动缩放元数据的多NameNode环境而言,效率不是很高。在本文中,我们设计了基于哈希表的架构,使用sha256作为密钥的Hadoop ARchive Plus(HAR +),这是对现有HAR的修改。 HAR +旨在提供更高的可靠性,还可以提供元数据的自动缩放。 HAR +使用多个NameNode而不是使用一个NameNode来存储元数据。我们的结果表明,HAR +大大减少了单个NameNode的负载。与Hadoop ARchive相比,这使群集更具可伸缩性,更强大且不易发生故障。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号