HAR+: Archive and metadata distribution! Why not both?

机译：HAR +：存档和元数据分发！为什么不兼得？

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Size of the data used in today's enterprises has been expanding at a huge range from last few years. Simultaneously, the need to process and analyze the large volumes of data has also increased. Hadoop Distributed File System (HDFS), is an open source implementation of Apache, designed for running on commodity hardware to handle applications having large datasets (TB, PB). HDFS architecture is based on single master (Name Node), which handles the metadata for large number of slaves. To get maximum efficiency, Name Node stores all of the metadata in its RAM. So, when dealing with huge number of small files, Name Node often becomes a bottleneck for HDFS as it might run out of memory. Apache Hadoop uses Hadoop ARchive (HAR) to deal with small files. But it is not so efficient for multi-NameNode environment, which requires automatic scaling of metadata. In this paper, we have designed hashtable based architecture, Hadoop ARchive Plus (HAR+) using sha256 as the key, which is a modification of existing HAR. HAR+ is designed to provide more reliability which can also provide auto scaling of metadata. Instead of using one NameNode for storing the metadata, HAR+ uses multiple NameNodes. Our result shows that HAR+ reduces the load of a single NameNode in significant amount. This makes the cluster more scalable, more robust and less prone to failure unlike of Hadoop ARchive.

机译：从最近几年开始，当今企业中使用的数据大小一直在巨大范围内扩展。同时，处理和分析大量数据的需求也增加了。 Hadoop分布式文件系统（HDFS）是Apache的开源实现，旨在在商品硬件上运行以处理具有大型数据集（TB，PB）的应用程序。 HDFS体系结构基于单个主节点（名称节点），该主节点处理大量从节点的元数据。为了获得最大效率，名称节点将所有元数据存储在其RAM中。因此，在处理大量小文件时，名称节点通常会成为HDFS的瓶颈，因为它可能会耗尽内存。 Apache Hadoop使用Hadoop ARchive（HAR）处理小文件。但是对于需要自动缩放元数据的多NameNode环境而言，效率不是很高。在本文中，我们设计了基于哈希表的架构，使用sha256作为密钥的Hadoop ARchive Plus（HAR +），这是对现有HAR的修改。 HAR +旨在提供更高的可靠性，还可以提供元数据的自动缩放。 HAR +使用多个NameNode而不是使用一个NameNode来存储元数据。我们的结果表明，HAR +大大减少了单个NameNode的负载。与Hadoop ARchive相比，这使群集更具可伸缩性，更强大且不易发生故障。

著录项

来源
《International Conference on Computer Communication and Informatics》|2015年|1-6|共6页
会议地点
作者
Dev Dipayan; Patgiri Ripon;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Big Data; HAR; HDFS; Hadoop; Metadata; Small files;

机译：大数据; HAR; HDFS; Hadoop;元数据;小文件;

相似文献

外文文献
中文文献
专利

1. The Global Streamflow Indices and Metadata Archive (GSIM) – Part 1: The production of a daily streamflow archive and metadata [J] . Do Hong Xuan, Gudmundsson Lukas, Leonard Michael, Earth System Science Data . 2018,第2期

机译：全球流量指数和元数据档案库（GSIM）–第1部分：每日流量档案和元数据的产生
2. The Global Streamflow Indices and Metadata Archive (GSIM) – Part 1: The production of a daily streamflow archive and metadata [J] . Do Hong Xuan, Gudmundsson Lukas, Leonard Michael, Earth System Science Data Discussions . 2018,第2期

机译：全球流量指数和元数据档案库（GSIM）–第1部分：每日流量档案和元数据的产生
3. Metadata evaluation criteria in respect to archival maps description: A systematic literature review [J] . The Electronic Library . 2020,第1期

机译：关于档案地图描述的元数据评估标准：系统文献综述
4. HAR+: Archive and metadata distribution! Why not both? [C] . Dev Dipayan, Patgiri Ripon International Conference on Computer Communication and Informatics . 2015

机译：HAR +：存档和元数据分布！为什么不同时？
5. The uses of archival metadata for administration and resource discovery. [D] . Harvey, Kathryn. 2005

机译：将档案元数据用于管理和资源发现。
6. Experiences with making diffraction image data available: what metadata do we need to archive? [O] . Loes M. J. Kroon-Batenburg, John R. Helliwell -1

机译：提供衍射图像数据的经验：我们需要归档哪些元数据？
7. Pengembangan Repository berbasis Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) pada Standar Metadata Encoding and Transmission Standard (METS) dan MPEG-21 Digital Item Declaration Language (DIDL) [O] . Taufiq Iqbal, Syarifuddin Syarifuddin 2020

机译：基于档案倡议的储存库的开发在标准编码和传输标准（METS）和MPEG-21数字项目声明语言（DID）上进行元数据收获（OAI-PMH）
8. Performance of the Open Archives Protocol for Metadata Harvesting Applied to Federal Register Metadata [R] . Futrelle, J., Zhang, H. 2003

机译：应用于联邦注册元数据的元数据收集开放存档协议的性能

HAR+: Archive and metadata distribution! Why not both?

摘要

著录项

相似文献

相关主题

期刊订阅