Schema design support for semi-structured data: Finding the sweet spot between NF and De-NF

机译：对半结构化数据的模式设计支持：找到NF和De-NF之间的最佳结合点

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Contemporary storage systems increasingly offer schema flexibility and support for semi-structured data models. This is the case for document-oriented databases, which as such allow ingestion of data from heterogeneous sources (IoT, sensors, monitoring). The increased influx of data further emphasizes the necessity for horizontal and elastic scalability, which are attained in NoSQL document stores through simplifying query functionality and relaxing transactional properties, e.g. through eventual consistency. The most compelling benefits of document stores are attained when data is stored in a denormalized form (De-NF). For example, one can decide to store relationships as an embedded copy to increase read query performance and as such avoid costly cross-node consultations. In comparison to the normalized form (NF), such designs come at a cost of additional data duplication, consistency and decreased write- and update performance. Determining the most appropriate data model for an application however depends on many factors, and the application developer is faced with the complexity of designing document data models that are optimized in terms of performance, scalability, storage and memory size, all requiring in-depth knowledge on the technology, the data meta-model, query plans and expected workloads. In this paper, we first discuss factors that impact the data schema design in document stores, such as the nature of the document and its attributes, horizontal partitioning, index selection, workload variability, and data uniformity. Although some data model design support tools are in existence, there are none that systematically take into account all these factors. Then, we outline our vision and roadmap towards systematic schema design support and tooling that involves (i) leveraging heuristics and common tactics to generate a finite number of candidate data models and (ii) ranking these candidate data models by means of cost functions that express their cost-effectiveness.

机译：当代的存储系统越来越多地提供模式灵活性，并支持半结构化数据模型。面向文档的数据库就是这种情况，因此可以从异构源（IoT，传感器，监控）中提取数据。越来越多的数据涌入进一步强调了水平和弹性可伸缩性的必要性，这在NoSQL文档存储中可通过简化查询功能和放宽事务性属性来实现。通过最终的一致性。当数据以非规范化形式（De-NF）存储时，将获得文档存储的最大优势。例如，可以决定将关系存储为嵌入式副本，以提高读取查询性能，从而避免进行昂贵的跨节点协商。与规范化形式（NF）相比，此类设计的代价是额外的数据重复，一致性以及降低的写入和更新性能。但是，为应用程序确定最合适的数据模型取决于许多因素，并且应用程序开发人员面临设计文档数据模型的复杂性，这些文档数据模型在性能，可伸缩性，存储和内存大小方面进行了优化，所有这些都需要深入的知识。技术，数据元模型，查询计划和预期的工作量。在本文中，我们首先讨论影响文档存储中数据模式设计的因素，例如文档的性质及其属性，水平分区，索引选择，工作负载可变性和数据一致性。尽管存在一些数据模型设计支持工具，但没有一个系统地考虑所有这些因素。然后，我们概述了我们对系统架构设计支持和工具的愿景和路线图，其中涉及（i）利用启发法和通用策略来生成有限数量的候选数据模型，以及（ii）通过表达成本函数的方式对这些候选数据模型进行排名他们的成本效益。

著录项

来源
《IEEE International Conference on Big Data》|2017年|2921-2930|共10页
会议地点
作者
Vincent Reniers; Dimitri Van Landuyt; Ansar Rafique; Wouter Joosen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Data models; Tools; Relational databases; Scalability; Indexes; Noise measurement;

机译：数据模型;工具;关系数据库;可伸缩性;索引;噪声测量;

相似文献

外文文献
中文文献
专利

1. Unique metadata schemas: A model for user-centric design of a performance support system [J] . Steven C. Schatz Educational Technology Research and Development . 2005,第4期

机译：独特的元数据模式：一种以用户为中心的性能支持系统设计模型
2. A Lightweight Approach to Extract Interschema Properties from Structured, Semi-Structured and Unstructured Sources in a Big Data Scenario [J] . International Journal of Information Technology & Decision Making . 2020,第3期

机译：在大数据场景中从结构化，半结构化和非结构化源中提取Interschema属性的轻质方法
3. A method of expression and management on semi-structured database schema [J] . Nakata Mitsuru 電子情報通信学会技術研究報告. 回路とシステム. Circuits and Systems . 2000,第416期

机译：一种半结构化数据库模式的表达和管理方法
4. Schema Design Support for Semi-Structured Data: Finding the Sweet Spot between NF and De-NF [C] . Vincent Reniers, Dimitri Van Landuyt, Ansar Rafique, IEEE International Conference on Big Data . 2017

机译：Schema设计支持半结构数据：在NF和DE-NF之间找到甜点
5. Fracture Detection and Prediction in Unconventional Reservoirs for Finding Sweet Spot [D] . Djezzar, Sofiane. 2019

机译：骨折储层寻找甜蜜斑点的裂缝检测与预测
6. Building Efficient Comparative Effectiveness Trials through Adaptive Designs Utility Functions and Accrual Rate Optimization: Finding the Sweet Spot [O] . Byron J. Gajewski, Scott M. Berry, Melanie Quintana, -1

机译：通过自适应设计效用函数和应计利率优化来建立有效的比较有效性试验：找到最佳解决方案
7. Evaluation of conceptual graphs as schemas for semi-structured databases. [O] . 2001

机译：将概念图评估为半结构化数据库的模式。
8. NETMARK: A Schema-less Extension for Relational Databases for Managing Semi-Structured Data Dynamically [R] . Maluf, D. A. , Trari, P. B. 2003

机译：NETmaRK：用于动态管理半结构化数据的关系数据库的无模式扩展

Schema design support for semi-structured data: Finding the sweet spot between NF and De-NF

摘要

著录项

相似文献

相关主题

期刊订阅