首页> 外文会议>IEEE international conference on data engineering >Cinderella — Adaptive online partitioning of irregularly structured data
【24h】

Cinderella — Adaptive online partitioning of irregularly structured data

机译:灰姑娘 - 不规则结构数据的自适应在线分区

获取原文
获取外文期刊封面目录资料

摘要

In an increasing number of use cases, databases face the challenge of managing irregularly structured data. Irregularly structured data is characterized by a quickly evolving variety of entities without a common set of attributes. These entities do not show enough regularity to be captured in a traditional database schema. A common solution is to centralize the diverse entities in a universal table. Usually, this leads to a very sparse table. Although today's techniques allow efficient storage of sparse universal tables, query efficiency is still a problem. Queries that reference only a subset of attributes have to read the whole universal table including many irrelevant entities. One possible solution is to use a partitioning of the table, which allows pruning partitions of irrelevant entities before they are touched. Creating and maintaining such a partitioning manually is very laborious or even infeasible, due to the enormous complexity. Thus an autonomous solution is desirable. In this paper, we define the Online Partitioning Problem for irregularly structured data and present Cinderella. Cinderella is an autonomous online algorithm for horizontal partitioning of irregularly structured entities in universal tables. It is designed to keep its overhead low by incrementally assigning entities to partitions while they are touched anyway during modifications. The achieved partitioning allows queries that retrieve only entities with a subset of attributes easily pruning partitions of irrelevant entities. Cinderella increases the locality of queries and reduces query execution cost.
机译:在越来越多的用例中,数据库面临管理不规则结构化数据的挑战。不规则结构化数据的特征在于,在没有常见的属性集的情况下快速发展的各种实体。这些实体不会在传统数据库模式中捕获足够的规律性。共同解决方案是将各种实体集中在通用表中。通常,这导致了一个非常稀疏的表。虽然今天的技术允许高效地存储稀疏的通用表,但查询效率仍然是一个问题。查询仅引用属性子集必须读取包含许多不相关实体的整个通用表。一种可能的解决方案是使用表的分区,这允许在触摸之前允许修剪不相关实体的分区。由于巨大的复杂性,手动创造和维护这种分区非常费力甚至不可行。因此,可以是一种自主解决方案。在本文中,我们为不规则结构化数据和现有灰姑娘定义了在线分区问题。灰姑娘是一种自主在线算法,用于在通用表中的不规则结构实体的水平分区。它旨在通过在修改期间触摸它们在触摸时逐步分配实体来保持其开销低电平。实现的分区允许查询仅检索具有属性子集的实体,容易修剪无关实体的分区。灰姑娘增加了查询的局部性并降低了查询执行成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号