首页> 外文会议>IEEE International Conference on Big Data >Schema design support for semi-structured data: Finding the sweet spot between NF and De-NF
【24h】

Schema design support for semi-structured data: Finding the sweet spot between NF and De-NF

机译:对半结构化数据的模式设计支持:找到NF和De-NF之间的最佳结合点

获取原文

摘要

Contemporary storage systems increasingly offer schema flexibility and support for semi-structured data models. This is the case for document-oriented databases, which as such allow ingestion of data from heterogeneous sources (IoT, sensors, monitoring). The increased influx of data further emphasizes the necessity for horizontal and elastic scalability, which are attained in NoSQL document stores through simplifying query functionality and relaxing transactional properties, e.g. through eventual consistency. The most compelling benefits of document stores are attained when data is stored in a denormalized form (De-NF). For example, one can decide to store relationships as an embedded copy to increase read query performance and as such avoid costly cross-node consultations. In comparison to the normalized form (NF), such designs come at a cost of additional data duplication, consistency and decreased write- and update performance. Determining the most appropriate data model for an application however depends on many factors, and the application developer is faced with the complexity of designing document data models that are optimized in terms of performance, scalability, storage and memory size, all requiring in-depth knowledge on the technology, the data meta-model, query plans and expected workloads. In this paper, we first discuss factors that impact the data schema design in document stores, such as the nature of the document and its attributes, horizontal partitioning, index selection, workload variability, and data uniformity. Although some data model design support tools are in existence, there are none that systematically take into account all these factors. Then, we outline our vision and roadmap towards systematic schema design support and tooling that involves (i) leveraging heuristics and common tactics to generate a finite number of candidate data models and (ii) ranking these candidate data models by means of cost functions that express their cost-effectiveness.
机译:当代的存储系统越来越多地提供模式灵活性,并支持半结构化数据模型。面向文档的数据库就是这种情况,因此可以从异构源(IoT,传感器,监控)中提取数据。越来越多的数据涌入进一步强调了水平和弹性可伸缩性的必要性,这在NoSQL文档存储中可通过简化查询功能和放宽事务性属性来实现。通过最终的一致性。当数据以非规范化形式(De-NF)存储时,将获得文档存储的最大优势。例如,可以决定将关系存储为嵌入式副本,以提高读取查询性能,从而避免进行昂贵的跨节点协商。与规范化形式(NF)相比,此类设计的代价是额外的数据重复,一致性以及降低的写入和更新性能。但是,为应用程序确定最合适的数据模型取决于许多因素,并且应用程序开发人员面临设计文档数据模型的复杂性,这些文档数据模型在性能,可伸缩性,存储和内存大小方面进行了优化,所有这些都需要深入的知识。技术,数据元模型,查询计划和预期的工作量。在本文中,我们首先讨论影响文档存储中数据模式设计的因素,例如文档的性质及其属性,水平分区,索引选择,工作负载可变性和数据一致性。尽管存在一些数据模型设计支持工具,但没有一个系统地考虑所有这些因素。然后,我们概述了我们对系统架构设计支持和工具的愿景和路线图,其中涉及(i)利用启发法和通用策略来生成有限数量的候选数据模型,以及(ii)通过表达成本函数的方式对这些候选数据模型进行排名他们的成本效益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号