首页> 外文会议>IEEE International Symposium on Information Theory >Coded Data Rebalancing: Fundamental Limits and Constructions
【24h】

Coded Data Rebalancing: Fundamental Limits and Constructions

机译:编码数据再平衡:基本限制和构造

获取原文
获取外文期刊封面目录资料

摘要

Distributed databases often suffer unequal distribution of data among storage nodes, which is known as ‘data skew’. Data skew arises from a number of causes such as removal of existing storage nodes and addition of new empty nodes to the database. Data skew leads to performance degradations and thus necessitates ‘rebalancing’ at regular intervals to reduce the amount of skew. We define an r-balanced distributed database as a distributed database in which the storage across the nodes has uniform size, and each bit of the data is replicated in r distinct storage nodes. We consider the problem of designing such balanced databases along with associated rebalancing schemes which maintain the r-balanced property under node removal and addition operations. We present a class of r-balanced databases (parameterized by the number of storage nodes) which have the property of structural invariance, i.e., the databases designed for different number of storage nodes have the same structure. For this class of r-balanced databases, we present rebalancing schemes which use coded transmissions between storage nodes, and characterize their communication loads under node addition and removal. We show that the communication cost incurred to rebalance our distributed database for node addition and removal is optimal, i.e., it achieves the minimum possible cost among all possible balanced distributed databases and rebalancing schemes.
机译:分布式数据库通常会在存储节点之间遭受不平等的数据分配,这被称为“数据偏斜”。数据偏斜是由多种原因引起的,例如删除现有存储节点以及向数据库添加新的空节点。数据偏差会导致性能下降,因此需要定期进行“重新平衡”以减少偏差量。我们将r平衡的分布式数据库定义为一种分布式数据库,其中跨节点的存储具有统一的大小,并且数据的每一位都在r个不同的存储节点中复制。我们考虑设计这样的平衡数据库以及相关的重新平衡方案的问题,该方案在节点删除和添加操作下保持r-balanced属性。我们提出了一类具有结构不变性的r平衡数据库(由存储节点的数量参数化),即为不同数量的存储节点设计的数据库具有相同的结构。对于这类r平衡数据库,我们提出了重新平衡方案,该方案使用存储节点之间的编码传输,并描述了在节点添加和删除下它们的通信负载。我们表明,重新平衡分布式数据库以进行节点添加和删除所招致的通信成本是最佳的,即,它在所有可能的平衡分布式数据库和重新平衡方案中实现了最低的成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号