首页> 外文学位 >Borrowing information across genes and experiments for improved error variance estimation in microarray data analysis and statistical inferences for gene expression heterosis.
【24h】

Borrowing information across genes and experiments for improved error variance estimation in microarray data analysis and statistical inferences for gene expression heterosis.

机译:跨基因借阅信息和进行实验,以改进微阵列数据分析中的误差方差估计以及基因表达杂种优势的统计推断。

获取原文
获取原文并翻译 | 示例

摘要

In this dissertation, we develop statistical models and methods for microarray data to borrow information across genes and/or even across experiments to improve statistical inferences for specific biological questions.;In Chapter 2, we develop statistical methods to improve the estimation of gene expression error variances. Good estimation of error variances is crucial for detecting differentially expressed genes (genes that differ in mean expression level across treatments or conditions of interest). Since the sample size available for each gene is often low, the usual unbiased estimator of the error variance can be unreliable. Shrinkage methods, including empirical Bayes approaches that borrow information across genes to produce more stable estimates, have been developed in recent years. Because the same microarray platform is often used for at least several experiments to study similar biological systems, there is an opportunity to improve variance estimation further by borrowing information not only across genes but also across experiments. We propose a lognormal model for error variances that involves random gene effects and random experiment effects. Simulation studies with data generated from different probability models and real microarray data show that our method outperforms existing approaches.;In Chapter 3, we develop statistical methods to improve the estimation and testing of gene expression heterosis. Heterosis, also known as the hybrid vigor, refers to the superior phenotype of the hybrid offspring relative to its two inbred parents. Though the heterosis phenomenon has been extensively utilized in agriculture for over a century, the molecular basis is still unknown. In an effort to understand the basic mechanisms responsible for the phenotypic heterosis at the molecular level, researchers have begun to compare expression levels of thousands of genes in the parental inbred lines and their offspring to find genes that exhibit gene expression heterosis. In our study, we focus on three types of gene expression heterosis: high-parent heterosis, low-parent heterosis and mid-parent heterosis. Currently, the sample average method is the most commonly used method for estimation and testing of gene expression heterosis. However, the sample average estimators underestimate high-parent heterosis and low-parent heterosis, which consequently leads to loss of power in hypothesis testing. Though the sample average estimator for mid-parent heterosis is unbiased, with only a few replicates in a typical microarray experiment, estimation is highly variable. To improve the estimation and testing of all three types of gene expression heterosis, we develop a hierarchical model, which permits information sharing across genes. Based on the model, we derive empirical Bayes estimators, and test gene expression heterosis using posterior probabilities. The effectiveness of our approach is demonstrated through simulations based on two real heterosis microarray experiments as well as hypothetical probability models that violate our model assumptions.;Chapter 4 presents statistical analysis of a soil-based carbon sequestration experiment. Driven by global climate change due to the increasing level of atmospheric carbon dioxide, researchers have proposed a soil-based carbon sequestration approach. A soil-based carbon sequestration approach reduces carbon dioxide emission from crop residues after harvesting and sequesters more carbon into the land as a soil nutrient. Previous research has reported significant differences across species in their rates of residue decomposition and the amount of carbon dioxide emission. Because the biomass composition varies across maize genotypes, we hypothesize that there are also differences among genotypes within the maize species in their rates of biomass decomposition and abilities of carbon sequestration. We designed and performed a longitudinal experiment to measure the amount of carbon dioxide flux from crop stover samples of 14 maize varieties. Flux observations for more than 150 days were collected. We modeled the logarithm of carbon dioxide flux as a linear function of genotype, day, and genotype-by-day interaction effects as well as several other important fixed and random factors. The analysis results show significant differences among maize varieties with respect to the accumulated carbon dioxide flux from crop residues as well as flux pattern over time. We also investigate relationships of carbon dioxide emission and several potentially influential chemical compounds in the maize residue biomass composition. These results suggest the potential for development of "carbon capturing crops" through bioengineering or hybrid methods. (Abstract shortened by UMI.).
机译:本文开发了微阵列数据的统计模型和方法,借以跨基因甚至跨实验借阅信息,以改善对特定生物学问题的统计推断。在第二章中,我们开发了统计方法来改善对基因表达错误的估计。差异。正确估计误差方差对于检测差异表达的基因(在不同治疗或条件下平均表达水平不同的基因)至关重要。由于每个基因可用的样本量通常很小,因此误差方差的通常无偏估计器可能不可靠。近年来已经开发了收缩方法,包括经验性贝叶斯方法,该方法利用跨基因的信息来产生更稳定的估计值。由于同一个微阵列平台通常至少用于几个用于研究相似生物系统的实验,因此有机会通过不仅借用基因间的信息,而且借用实验间的信息来进一步改善方差估计。我们为误差方差提出了一个对数正态模型,该模型涉及随机基因效应和随机实验效应。用不同概率模型生成的数据和实际微阵列数据进行的仿真研究表明,我们的方法优于现有方法。在第三章中,我们开发了统计方法来改善基因表达杂种优势的估计和测试。杂种优势,也称为杂种优势,是指杂种后代相对于其两个近交亲本的优越表型。尽管杂种优势现象已在农业中广泛应用了一个多世纪,但分子基础仍是未知的。为了在分子水平上了解造成表型杂种优势的基本机制,研究人员已开始比较亲本近交系及其后代中成千上万个基因的表达水平,以发现表现出基因表达杂种优势的基因。在我们的研究中,我们专注于三种类型的基因表达杂种优势:高亲杂种优势,低亲杂种优势和中亲杂种优势。当前,样本平均法是估计和测试基因表达杂种优势的最常用方法。但是,样本平均估计量低估了高亲代杂种优势和低亲代杂种优势,因此导致假设检验中的能力丧失。尽管中父母亲杂种优势的样本平均估计量没有偏见,在典型的微阵列实验中只有少数重复,但估计量变化很大。为了改善对所有三种类型的基因表达杂种优势的估计和测试,我们开发了一个层次模型,该模型允许跨基因共享信息。基于该模型,我们得出经验贝叶斯估计量,并使用后验概率测试基因表达杂种优势。通过基于两个真实的杂种优势微阵列实验以及违反我们的模型假设的假设概率模型进行的仿真,证明了我们方法的有效性。第4章介绍了基于土壤的碳固存实验的统计分析。由于大气中二氧化碳含量的增加,在全球气候变化的驱动下,研究人员提出了一种基于土壤的碳固存方法。以土壤为基础的碳固存方法可减少收获后农作物残留物中的二氧化碳排放,并将更多的碳固存为土壤养分。先前的研究报道了不同物种之间在残留物分解速率和二氧化碳排放量上的显着差异。因为生物量组成因玉米基因型而异,所以我们假设玉米物种内的基因型之间在生物量分解速率和碳固存能力方面也存在差异。我们设计并进行了纵向实验,以测量14个玉米品种的农作物秸秆样品中的二氧化碳通量。收集了超过150天的通量观测值。我们将二氧化碳通量的对数建模为基因型,日间和基因型与日常相互作用以及其他一些重要的固定和随机因素的线性函数。分析结果表明,玉米品种在残留农作物产生的二氧化碳通量以及随时间变化的通量方面存在显着差异。我们还研究了玉米残留生物量组成中二氧化碳排放与几种潜在影响化合物的关系。这些结果表明通过生物工程或混合方法开发“碳捕获作物”的潜力。 (摘要由UMI缩短。)。

著录项

  • 作者

    Ji, Tieming.;

  • 作者单位

    Iowa State University.;

  • 授予单位 Iowa State University.;
  • 学科 Biology Biostatistics.;Statistics.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 107 p.
  • 总页数 107
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:43:39

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号