首页> 外文期刊>Journal of Parallel and Distributed Computing >Deister: A light-weight autonomous block management in data-intensive file systems using deterministic declustering distribution
【24h】

Deister: A light-weight autonomous block management in data-intensive file systems using deterministic declustering distribution

机译:Deister:使用确定性分簇分布的数据密集型文件系统中的轻量级自动块管理

获取原文
获取原文并翻译 | 示例

摘要

During the last few decades, Data-intensive File Systems (DiFS), such as Google File System (GFS) and Hadoop Distributed File System (HDFS) have become the key storage architectures for big data processing. These storage systems usually divide files into fixed-sized blocks (or chunks). Each block is replicated (usually three-way) and distributed pseudo-randomly across the cluster. The master node (namenode) uses a huge table to record the locations of each block and its replicas. However, with the increasing size of the data, the block location table and its corresponding maintenance could occupy more than half of the memory space and 30% of processing capacity in master node, which severely limit the scalability and performance of master node. We argue that the physical data distribution and maintenance should be separated out from the metadata management and performed by each storage node autonomously. In this paper, we propose Deister, a novel block management scheme that is built on an invertible deterministic declustering distribution method called Intersected Shifted Declustering (ISD). Deister is amendable to current research on scaling the namespace management in master node. In Deister, the huge table for maintaining the block locations in the master node is eliminated and the maintenance of the block-node mapping is performed autonomously on each data node. Results show that as compared with the HDFS default configuration, Deister is able to achieve identical performance with a saving of about half of the RAM space and 30% of processing capacity in master node and is expected to scale to double the size of current single namenode HDFS cluster, pushing the scalability bottleneck of master node back to namespace management.
机译:在过去的几十年中,诸如Google文件系统(GFS)和Hadoop分布式文件系统(HDFS)之类的数据密集型文件系统(DiFS)已成为大数据处理的关键存储架构。这些存储系统通常将文件分成固定大小的块(或大块)。每个块都被复制(通常是三路),并在整个集群中伪随机分布。主节点(名称节点)使用一个巨大的表来记录每个块及其副本的位置。但是,随着数据大小的增加,块位置表及其相应的维护可能占据主节点一半以上的存储空间和30%的处理能力,这严重限制了主节点的可伸缩性和性能。我们认为,物理数据的分发和维护应与元数据管理分开,并由每个存储节点自主执行。在本文中,我们提出了Deister,这是一种新颖的块管理方案,它基于一种称为“相移移位聚簇(ISD)”的可逆确定性聚簇分布方法。 Deister是当前关于在主节点上扩展名称空间管理的研究的修正。在Deister中,消除了用于维护主节点中块位置的庞大表,并且在每个数据节点上自主执行了块节点映射的维护。结果表明,与HDFS默认配置相比,Deister能够实现相同的性能,同时节省大约一半的RAM空间和30%的主节点处理能力,并且有望扩展到当前单个Namenode大小的两倍。 HDFS集群,将主节点的可伸缩性瓶颈推回了名称空间管理。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号