首页> 外文会议>2013 IEEE 29th International Conference on data Engineering Workshops >HotROD: Managing grid storage with on-demand replication
【24h】

HotROD: Managing grid storage with on-demand replication

机译:HotROD:通过按需复制管理网格存储

获取原文
获取原文并翻译 | 示例

摘要

Enterprises (such as, Yahoo!, LinkedIn, Facebook) operate their own compute/storage infrastructure, which is effectively a “private cloud”. The private cloud consists of multiple clusters, each of which is managed independently. With HDFS, whenever data is stored in the cluster, it is replicated within the cluster for availability. Unfortunately, for datasets shared across the enterprise, this leads to the problem of over-replication within the private cloud. An analysis of Yahoo!'s HDFS usage suggests that the disk space consumed due to replication of shared datasets is substantial (viz., to the tune of PB's of storage). New data sets are typically popular and requested by many processing jobs in (different) clusters. This demand is satisfied by copying the dataset to each of the clusters. As data sets age, however, they get used less and become cold. We then have the opposite problem of having data overreplicated across clusters: each cluster has enough replicas to recover from data loss locally, and the sum total of replicas is high. We address both the problems of initially replicating data and cross cluster recovery in a private cloud setting using the same technique: on-demand replication, which we refer to as Hot Replication On-Demand (HotROD). By making files visible across HDFS clusters, we let a cluster pull in remote replicas as needed, both for initial replication and later recovery. We implemented HotROD as an extension to a standard HDFS installation.
机译:企业(例如Yahoo!,LinkedIn,Facebook)运营着自己的计算/存储基础架构,实际上是“私有云”。私有云由多个群集组成,每个群集都独立管理。使用HDFS,无论何时将数据存储在群集中,都将在群集内复制数据以提高可用性。不幸的是,对于整个企业之间共享的数据集,这会导致私有云内过度复制的问题。对Yahoo!的HDFS使用情况的分析表明,由于共享数据集的复制而消耗的磁盘空间是巨大的(即,根据PB的存储空间而定)。新数据集通常很受欢迎,并且(不同)集群中的许多处理作业都需要使用新数据集。通过将数据集复制到每个聚类,可以满足此需求。但是,随着数据集的老化,它们的使用量会减少,并且会变冷。然后,我们面临一个相反的问题,即跨群集过度复制数据:每个群集都有足够的副本以从本地丢失数据中恢复,并且副本的总数很高。我们使用相同的技术来解决私有云设置中最初复制数据和跨集群恢复的问题:按需复制,我们称为按需热复制(HotROD)。通过使文件在HDFS群集中可见,我们使群集可以根据需要拉入远程副本,以便进行初始复制和以后的恢复。我们将HotROD实施为对标准HDFS安装的扩展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号