首页> 外文期刊>Information Systems >Schema profiling of document-oriented databases
【24h】

Schema profiling of document-oriented databases

机译:面向文档的数据库的架构分析

获取原文
获取原文并翻译 | 示例
           

摘要

In document-oriented databases, schema is a soft concept and the documents in a collection can be stored using different local schemata. This gives designers and implementers augmented flexibility; however, it requires an extra effort to understand the rules that drove the use of alternative schemata when sets of documents with different -and possibly conflicting- schemata are to be analyzed or integrated. In this paper we propose a technique, called schema profiling, to explain the schema variants within a collection in document-oriented databases by capturing the hidden rules explaining the use of these variants. We express these rules in the form of a decision tree (schema profile). Consistently with the requirements we elicited from real users, we aim at creating explicative, precise, and concise schema profiles. The algorithm we adopt to this end is inspired by the well-known C4.5 classification algorithm and builds on two original features: the coupling of value-based and schema-based conditions within schema profiles, and the introduction of a novel measure of entropy to assess the quality of a schema profile. A set of experimental tests made on both synthetic and real datasets demonstrates the effectiveness and efficiency of our approach. (C) 2018 Elsevier Ltd. All rights reserved.
机译:在面向文档的数据库中,模式是一个软概念,可以使用不同的本地模式存储集合中的文档。这为设计人员和实施人员提供了更大的灵活性。但是,当要分析或整合具有不同(甚至可能是冲突)方案的文档集时,需要花费更多的精力来理解驱动使用替代方案的规则。在本文中,我们提出了一种称为模式概要分析的技术,该技术通过捕获解释这些变量用法的隐藏规则来解释面向文档的数据库中的集合中的模式变量。我们以决策树(模式配置文件)的形式表示这些规则。与我们从真实用户那里获得的要求一致,我们的目标是创建明确,精确和简洁的架构配置文件。为此,我们采用的算法是受到著名的C4.5分类算法的启发,并基于两个原始特征:模式概要文件中基于值和基于模式的条件的耦合,以及引入新的熵度量评估架构概要文件的质量。在合成数据集和真实数据集上进行的一组实验测试证明了我们方法的有效性和效率。 (C)2018 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号