首页> 外文学位 >The construction and usage of a microarray data warehousing system.
【24h】

The construction and usage of a microarray data warehousing system.

机译:微阵列数据仓库系统的构建和使用。

获取原文
获取原文并翻译 | 示例

摘要

The human genome project, which began in 1988 and completed in 2001, ushered in a new era of biology. This effort accelerated the development of biochemical assay and information technologies. As a result, biologists are now able to ask questions that were previously considered intractable. One example of a breakthrough in assay technology is the DNA microarray, a high-throughput measurement device which enables individual scientists to rapidly and simultaneously interrogate the RNA concentration levels of virtually all genes in the human genome for a single biological source. As with all advances, the advent of DNA microarray has created a new frontier of challenges. In this document, I describe an approach that addresses the problem of assembly, processing, and subsequent analysis of large volumes of data collected with DNA microarray. My work is presented in 5 chapters and 3 appendices.;Chapter 1 serves as a general introduction DNA microarray assay technology, idiosyncrasies of using this technology in biological experiments, methods for preprocessing the resulting experimental data, and techniques used in the informatic systems that enable the processing, representation, storage, and subsequent retrieval of these data.;Chapter 2 is the core of the dissertation and describes the Celsius project, a microarray data warehousing system that is an implemented solution to the informatic problems described in 1. The completion of the Celsius project brought into existence the single largest publicly available source of primary and uniformly pre-processed DNA microarray data.;Chapter 3 builds upon Chapter 2 by describing an analysis of the data present in Celsius. Specifically, it describes the creation of gene-gene correlation matrices and their application in performing gene annotation and identifying disease genes within known linkage regions. While the idea of using gene-gene coexpression patterns is as old as DNA microarray technology itself, the scale of this analysis is unprecedented and the demonstrated applicability of the correlation data to a broad set of biological questions raises concerns about the validity of current microarray data deposition systems which rely heavily on experimental metadata.;Chapter 4 presents Biopackages.net, a technical subsystem of the data warehousing system described in Chapter 2. Reproducibility is a shared pillar of both scientific and data warehousing methods. Because Celsius is very dependent on computing systems to process the data stored in the warehouse, it was essential to have a mechanism for making uniform and reproducible computing environments. This not only allows the system to scale as the volume of data inevitably increases, but also garners the benefits of being able to clone the system at other sites and to recover from failures.;Chapter 5 and Appendix A describe efforts for data modeling and dissemination. As we enter the post-genome era, new assay technologies continue to appear, and the growth in volume of existing and new data generated from each technology continues to accelerate. Thus, it imperative that protocols be developed for the encoding and distribution of these data to both individual scientists and the information systems and agents acting on their behalf.;Appendix B and Appendix C present analyses performed on previous iterations of Celsius, which is described in Chapter 2. These early collaborations provided a glimpse of the utility of creating a micorarray data warehouse, without which the work described here would never have been completed.
机译:人类基因组计划始于1988年,并于2001年完成,它开创了生物学的新纪元。这项工作加速了生化分析和信息技术的发展。结果,生物学家现在能够提出以前被认为难以解决的问题。分析技术突破的一个例子是DNA微阵列,这是一种高通量的测量设备,使单个科学家能够快速并同时查询单个生物来源中人类基因组中几乎所有基因的RNA浓度水平。与所有进展一样,DNA微阵列的出现创造了挑战的新领域。在本文档中,我描述了一种解决方法,该方法解决了组装​​,处理和后续分析使用DNA微阵列收集的大量数据的问题。我的工作分5章和3个附录进行介绍。第1章是DNA微阵列测定技术的一般介绍,在生物实验中使用该技术的特质,用于预处理所得实验数据的方法以及用于信息系统的技术第2章是论文的核心,它描述了Celsius项目,该项目是一种微阵列数据仓库系统,是对1.中描述的信息问题的一种已实现的解决方案。 Celsius项目使唯一可公开获得的最大的原始和统一预处理的DNA微阵列数据来源成为现实。第三章基于对Celsius中存在的数据的分析,以第二章为基础。具体来说,它描述了基因-基因相关矩阵的创建及其在执行基因注释和识别已知连锁区域内的疾病基因中的应用。尽管使用基因-基因共表达模式的想法与DNA微阵列技术本身一样古老,但这种分析的规模是空前的,并且已证明相关数据对广泛的生物学问题的适用性引起了人们对当前微阵列数据有效性的担忧。沉积系统严重依赖于实验性元数据。第4章介绍了Biopackages.net,这是第2章中描述的数据仓库系统的技术子系统。可再现性是科学和数据仓库方法的共同支柱。由于Celsius非常依赖于计算系统来处理仓库中存储的数据,因此必须具有一种使统一和可重现的计算环境成为可能的机制。这不仅使系统能够随着数据量的增加而不可避免地进行扩展,而且还具有能够在其他站点克隆系统并从故障中恢复的好处。;第5章和附录A描述了数据建模和分发的工作。随着我们进入后基因组时代,新的测定技术不断出现,并且每种技术产生的现有数据和新数据的数量持续增长。因此,必须开发出协议,以便将这些数据编码和分发给各个科学家以及代表他们行事的信息系统和代理。附录B和附录C介绍了对摄氏以前的迭代执行的分析,其描述如下:第2章。这些早期的合作使您可以快速了解创建micorarray数据仓库的实用性,没有它,这里描述的工作将永远无法完成。

著录项

  • 作者

    Day, Allen Jason.;

  • 作者单位

    University of California, Los Angeles.;

  • 授予单位 University of California, Los Angeles.;
  • 学科 Bioinformatics.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 130 p.
  • 总页数 130
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号