首页> 外文期刊>Information Systems >Metadata management for scientific databases
【24h】

Metadata management for scientific databases

机译:科学数据库的元数据管理

获取原文
获取原文并翻译 | 示例
       

摘要

Most scientific databases consist of datasets (or sources) which in turn include samples (or files) with an identical structure (or schema). In many cases, samples are associated with rich metadata, describing the process that leads to building them (e.g.: the experimental conditions used during sample generation). Metadata are typically used in scientific computations just for the initial data selection; at most, metadata about query results is recovered after executing the query, and associated with its results by post-processing. In this way, a large body of information that could be relevant for interpreting query results goes unused during query processing.In this paper, we present ScQL, a new algebraic relational language, whose operations apply to objects consisting of data-metadata pairs, by preserving such one-to-one correspondence throughout the computation. We formally define each operation and we describe an optimization, called meta first, that may significantly reduce the query processing overhead by anticipating the use of metadata for selectively loading into the execution environment only those input samples that contribute to the result samples.In ScQL, metadata have the same relevance as data, and contribute to building query results; in this way, the resulting samples are systematically associated with metadata about either the specific input samples involved or about query processing, thereby yielding a new form of metadata provenance. We present many examples of use of ScQL, relative to several application domains, and we demonstrate the effectiveness of the meta-first optimization. (C) 2018 The Authors. Published by Elsevier Ltd.
机译:大多数科学数据库由数据集(或源)组成,而数据集(或源)又包括具有相同结构(或架构)的样本(或文件)。在许多情况下,样本与丰富的元数据相关联,描述了构建样本的过程(例如:样本生成过程中使用的实验条件)。元数据通常在科学计算中仅用于初始数据选择。最多,关于查询结果的元数据将在执行查询后恢复,并通过后处理与其结果相关联。这样,在查询处理期间就没有使用与解释查询结果相关的大量信息。在本文中,我们提出了一种新的代数关系语言ScQL,其运算适用于由数据元数据对组成的对象。在整个计算过程中保持这种一对一的对应关系。我们正式定义每个操作并描述一种称为元优先的优化,该优化可以通过预期使用元数据来选择性地仅将对结果样本有贡献的那些输入样本选择性地加载到执行环境中,从而显着减少查询处理的开销。元数据与数据具有相同的相关性,并有助于建立查询结果;以这种方式,将所得样本与涉及的特定输入样本或查询处理的元数据系统地关联,从而产生一种新形式的元数据来源。相对于几个应用程序领域,我们提供了许多使用ScQL的示例,并且我们展示了元优先优化的有效性。 (C)2018作者。由Elsevier Ltd.发布

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号