The construction and usage of a microarray data warehousing system.

机译：微阵列数据仓库系统的构建和使用。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The human genome project, which began in 1988 and completed in 2001, ushered in a new era of biology. This effort accelerated the development of biochemical assay and information technologies. As a result, biologists are now able to ask questions that were previously considered intractable. One example of a breakthrough in assay technology is the DNA microarray, a high-throughput measurement device which enables individual scientists to rapidly and simultaneously interrogate the RNA concentration levels of virtually all genes in the human genome for a single biological source. As with all advances, the advent of DNA microarray has created a new frontier of challenges. In this document, I describe an approach that addresses the problem of assembly, processing, and subsequent analysis of large volumes of data collected with DNA microarray. My work is presented in 5 chapters and 3 appendices.;Chapter 1 serves as a general introduction DNA microarray assay technology, idiosyncrasies of using this technology in biological experiments, methods for preprocessing the resulting experimental data, and techniques used in the informatic systems that enable the processing, representation, storage, and subsequent retrieval of these data.;Chapter 2 is the core of the dissertation and describes the Celsius project, a microarray data warehousing system that is an implemented solution to the informatic problems described in 1. The completion of the Celsius project brought into existence the single largest publicly available source of primary and uniformly pre-processed DNA microarray data.;Chapter 3 builds upon Chapter 2 by describing an analysis of the data present in Celsius. Specifically, it describes the creation of gene-gene correlation matrices and their application in performing gene annotation and identifying disease genes within known linkage regions. While the idea of using gene-gene coexpression patterns is as old as DNA microarray technology itself, the scale of this analysis is unprecedented and the demonstrated applicability of the correlation data to a broad set of biological questions raises concerns about the validity of current microarray data deposition systems which rely heavily on experimental metadata.;Chapter 4 presents Biopackages.net, a technical subsystem of the data warehousing system described in Chapter 2. Reproducibility is a shared pillar of both scientific and data warehousing methods. Because Celsius is very dependent on computing systems to process the data stored in the warehouse, it was essential to have a mechanism for making uniform and reproducible computing environments. This not only allows the system to scale as the volume of data inevitably increases, but also garners the benefits of being able to clone the system at other sites and to recover from failures.;Chapter 5 and Appendix A describe efforts for data modeling and dissemination. As we enter the post-genome era, new assay technologies continue to appear, and the growth in volume of existing and new data generated from each technology continues to accelerate. Thus, it imperative that protocols be developed for the encoding and distribution of these data to both individual scientists and the information systems and agents acting on their behalf.;Appendix B and Appendix C present analyses performed on previous iterations of Celsius, which is described in Chapter 2. These early collaborations provided a glimpse of the utility of creating a micorarray data warehouse, without which the work described here would never have been completed.

机译：人类基因组计划始于1988年，并于2001年完成，它开创了生物学的新纪元。这项工作加速了生化分析和信息技术的发展。结果，生物学家现在能够提出以前被认为难以解决的问题。分析技术突破的一个例子是DNA微阵列，这是一种高通量的测量设备，使单个科学家能够快速并同时查询单个生物来源中人类基因组中几乎所有基因的RNA浓度水平。与所有进展一样，DNA微阵列的出现创造了挑战的新领域。在本文档中，我描述了一种解决方法，该方法解决了组装，处理和后续分析使用DNA微阵列收集的大量数据的问题。我的工作分5章和3个附录进行介绍。第1章是DNA微阵列测定技术的一般介绍，在生物实验中使用该技术的特质，用于预处理所得实验数据的方法以及用于信息系统的技术第2章是论文的核心，它描述了Celsius项目，该项目是一种微阵列数据仓库系统，是对1.中描述的信息问题的一种已实现的解决方案。 Celsius项目使唯一可公开获得的最大的原始和统一预处理的DNA微阵列数据来源成为现实。第三章基于对Celsius中存在的数据的分析，以第二章为基础。具体来说，它描述了基因-基因相关矩阵的创建及其在执行基因注释和识别已知连锁区域内的疾病基因中的应用。尽管使用基因-基因共表达模式的想法与DNA微阵列技术本身一样古老，但这种分析的规模是空前的，并且已证明相关数据对广泛的生物学问题的适用性引起了人们对当前微阵列数据有效性的担忧。沉积系统严重依赖于实验性元数据。第4章介绍了Biopackages.net，这是第2章中描述的数据仓库系统的技术子系统。可再现性是科学和数据仓库方法的共同支柱。由于Celsius非常依赖于计算系统来处理仓库中存储的数据，因此必须具有一种使统一和可重现的计算环境成为可能的机制。这不仅使系统能够随着数据量的增加而不可避免地进行扩展，而且还具有能够在其他站点克隆系统并从故障中恢复的好处。；第5章和附录A描述了数据建模和分发的工作。随着我们进入后基因组时代，新的测定技术不断出现，并且每种技术产生的现有数据和新数据的数量持续增长。因此，必须开发出协议，以便将这些数据编码和分发给各个科学家以及代表他们行事的信息系统和代理。附录B和附录C介绍了对摄氏以前的迭代执行的分析，其描述如下：第2章。这些早期的合作使您可以快速了解创建micorarray数据仓库的实用性，没有它，这里描述的工作将永远无法完成。

著录项

作者
Day, Allen Jason.;
展开▼
作者单位

University of California, Los Angeles.;

展开▼
授予单位 University of California, Los Angeles.;
学科 Bioinformatics.
学位 Ph.D.
年度 2008
页码 130 p.
总页数 130
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. 使用cDNA微阵列和组织微阵列对三种上皮性卵巢肿瘤基因表达的分析 [J] . 郑敏, Simon R, Kononen J, 癌症（英文版） . 2004,第007期
2. On construction of a big data warehouse accessing platform for campus power usages [J] . Chih-Hung Chang, Fuu-ChengJiang, Chao-Tung Yang, Journal of Parallel and Distributed Computing . 2019,第Nova期

机译：构建校园用电大数据仓库访问平台
3. Biological Data Warehousing System for Identifying Transcriptional Regulatory Sites From Gene Expressions of Microarray Data [J] . Tsou A.-P., Sun Y.-M., Liu C.-L., IEEE transactions on information technology in biomedicine . 2006,第3期

机译：从微阵列数据的基因表达鉴定转录调控位点的生物数据仓库系统
4. The basis for bibliomining: Frameworks for bringing together usage-based data mining and bibliometrics through data warehousing in digital library services [J] . Scott Nicholson Information Processing & Management . 2006,第3期

机译：书目化的基础：通过数字图书馆服务中的数据仓库将基于使用情况的数据挖掘和书目计量学结合在一起的框架
5. A biological data warehousing system for identifying transcriptional regulatory sites from gene expressions of microarray data [C] . Jorng-Tzong Horng Emerging Information Technology Conference, 2005. . 2005

机译：用于从微阵列数据的基因表达中识别转录调控位点的生物数据仓库系统
6. Explaining variation in data warehouse usage: An interpretation perspective. [D] . Brohman, M. Kathryn. 2000

机译：解释数据仓库使用中的变化：一种解释角度。
7. MEGGASENSE – The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for the Construction of Sequence Data Warehouses [O] . Ranko Gacesa, Jurica Zucko, Solveig K. Petursdottir, 2017

机译：MEGGASENSE –元基因组/基因组注释序列自然语言搜索引擎：构建序列数据仓库的平台
8. Data warehousing and data quality for a Spatial Decision Support System. [O] . Dill Robert W. 1997

机译：空间决策支持系统的数据仓库和数据质量。

The construction and usage of a microarray data warehousing system.

摘要

著录项

相似文献

相关主题

期刊订阅