首页> 外文学位 >A database server for next-generation scientific data management.
【24h】

A database server for next-generation scientific data management.

机译:用于下一代科学数据管理的数据库服务器。

获取原文
获取原文并翻译 | 示例

摘要

The growth of scientific information and the increasing automation of data collection have made databases integral to many scientific disciplines including life sciences, physics, meteorology, earth and atmospheric sciences, and chemistry. These sciences pose new data management challenges to current database system technologies. This dissertation addresses the following three challenges: (1) Annotation Management: Annotations and provenance information are important metadata that go hand-in-hand with scientific data. Annotating scientific data represents a vital mechanism for scientists to share knowledge and build an interactive and collaborative environment. A major challenge is: How to manage large volumes of annotations, especially at various granularities, e.g., cell, column, and row level annotations, along with their corresponding data items. (2) Complex Dependencies Involving Real-world Activities: The processing of scientific data is a complex cycle that may involve sequences of activities external to the database system, e.g., wet-lab experiments, instrument readings, and manual measurements. These external activities may incur inherently long delays to prepare for and to conduct. Updating a database value may render parts of the database inconsistent until some external activity is executed and its output is reflected back and updated into the database. The challenge is: How to integrate these external activities within the database engine and accommodate the long delays between the updates while making the intermediate results instantly available for querying. (3) Fast Access to Scientific Data with Complex Data Types: Scientific experiments produce large volumes of data of complex types, e.g., arrays, images, long sequences, and multi-dimensional data. A major challenge is: How to provide fast access to these large pools of scientific data with non-traditional data types.In this dissertation, I present extensions to current database engines to address the above challenges. The proposed extensions enable scientific data to be stored and processed within their natural habitat: the database system. Experimental studies and performance analysis for all the proposed algorithms are carried out using both real-world and synthetic datasets. Our results show the applicability of the proposed extensions and their performance gains over other existing techniques and algorithms.
机译:科学信息的增长和数据收集的日益自动化已使数据库成为许多科学学科必不可少的组成部分,包括生命科学,物理学,气象学,地球和大气科学以及化学。这些科学对当前的数据库系统技术提出了新的数据管理挑战。本文针对以下三个挑战:(1)批注管理:批注和出处信息是与科学数据齐头并进的重要元数据。对科学数据进行注释是科学家共享知识并建立交互式协作环境的重要机制。一个主要的挑战是:如何管理大量注释,尤其是在各种粒度(例如单元格,列和行级注释)及其相应的数据项下。 (2)涉及现实活动的复杂依赖性:科学数据的处理是一个复杂的周期,可能涉及数据库系统外部的一系列活动,例如湿实验室实验,仪器读数和手动测量。这些外部活动可能会固有地导致准备和进行工作的时间过长。更新数据库值可能会导致数据库的各个部分不一致,直到执行了一些外部活动并将其输出反映回并更新到数据库中为止。面临的挑战是:如何在数据库引擎中集成这些外部活动,并适应两次更新之间的长时间延迟,同时使中间结果可立即用于查询。 (3)快速访问具有复杂数据类型的科学数据:科学实验会产生大量复杂类型的数据,例如数组,图像,长序列和多维数据。一个主要的挑战是:如何提供对具有非传统数据类型的大型科学数据池的快速访问。在本文中,我提出了当前数据库引擎的扩展,以解决上述挑战。拟议的扩展使科学数据可以在其自然栖息地内进行存储和处理:数据库系统。所有建议算法的实验研究和性能分析都是使用真实数据集和合成数据集进行的。我们的结果表明,与其他现有技术和算法相比,拟议扩展的适用性及其性能提升。

著录项

  • 作者

    Eltabakh, Mohamed Y.;

  • 作者单位

    Purdue University.;

  • 授予单位 Purdue University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 174 p.
  • 总页数 174
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号