...
【24h】

Multiple-testing strategy for analyzing cDNA array data on gene expression

机译:用于分析基因表达的cDNA阵列数据的多重测试策略

获取原文
获取原文并翻译 | 示例
           

摘要

An objective of many functional genomics studies is to estimate treatment-induced changes in gene expression. cDNA arrays interrogate each tissue sample for the levels of mRNA for hundreds to tens of thousands of genes, and the use of this technology leads to a multitude of treatment contrasts. By-gene hypotheses tests evaluate the evidence supporting no effect, but selecting a significance level requires dealing with the multitude of comparisons. The p-values from these tests order the genes such that a p-value cutoff divides the genes into two sets. Ideally one set would contain the affected genes and the other would contain the unaffected genes. However, the set of genes selected as affected will have false positives, i.e., genes that are not affected by treatment. Likewise, the other set of genes, selected as unaffected, will contain false negatives, i.e., genes that are affected. A plot of the observed p-values (1 - p) versus their expectation tinder a uniform [0, 1] distribution allows one to estimate the number of true null hypotheses. With this estimate, the false positive rates and false negative rates associated with any p-value cutoff can be estimated. When computed for a range of cutoffs, these rates summarize the ability of the study to resolve effects. In our work, we are more interested in selecting most of the affected genes rather than protecting against a few false positives. An optimum cutoff, i.e., the best set given the data, depends upon the relative cost of falsely classifying a gene as affected versus the cost of falsely classifying a gene as unaffected. We select the cutoff by a decision-theoretic method analogous to methods developed for receiver operating characteristic curves. In addition, we estimate the false discovery rate and the false nondiscovery rate associated with any cutoff value. Two functional genomics studies that were designed to assess a treatment effect are used to illustrate how the methods allowed the investigators to determine a cutoff to suit their research goals.
机译:许多功能基因组学研究的目的是估计治疗引起的基因表达变化。 cDNA阵列针对数百至数万个基因的mRNA水平询问每个组织样本,使用该技术会导致多种治疗差异。基于基因的假设检验评估没有证据支持的证据,但是选择显着性水平需要处理大量的比较。这些测试的p值对基因进行排序,以使p值截止将基因分为两组。理想地,一组将包含受影响的基因,而另一组将包含未受影响的基因。然而,被选择为受影响的基因组将具有假阳性,即不受治疗影响的基因。同样,选择为不受影响的另一组基因将包含假阴性,即受影响的基因。将观察到的p值(1-p)与它们的期望值作图,得出均匀的[0,1]分布图,就可以估算出真实零假设的数量。利用该估计,可以估计与任何p值截止相关的误报率和误报率。在计算一定范围的临界值时,这些比率概括了研究解决效应的能力。在我们的工作中,我们更感兴趣的是选择大多数受影响的基因,而不是防止一些假阳性。最佳截止值,即给定数据的最佳集合,取决于将基因错误分类为受影响的相对成本与将基因错误分类为不受影响的相对成本。我们通过类似于为接收器工作特性曲线开发的方法的决策理论方法选择截止值。此外,我们估计与任何临界值相关的错误发现率和错误未发现率。设计用于评估治疗效果的两项功能基因组学研究用于说明该方法如何使研究者确定适合其研究目标的临界值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号