首页> 外文期刊>Systems biomedicine. >Increasing the discovery power of -omics studies
【24h】

Increasing the discovery power of -omics studies

机译:增加组学研究的发现能力

获取原文
       

摘要

Motivation:?Current clinical and biological studies apply different biotechnologies and subsequently combine the resulting -omics data to test biological hypotheses. The plethora of -omics data and their combination generates a large number of hypotheses and apparently increases the study power. Contrary to these expectations, the wealth of -omics data may even reduce the statistical power of a study because of a large correction factor for multiple testing. Typically, this loss of power in analyzing -omics data are caused by an increased false detection rate (FDR) in measurements, like falsely detected DNA copy number changes, or falsely identified differentially expressed genes. The false detections are random and, therefore, not related to the tested conditions. Thus, a high FDR considerably decreases the discovery power of studies, especially if different -omics data are involved. Results:?On a HapMap data set, where known CNVs have to be re-detected, I/NI call filtering was much more efficient than variance-based filtering. In particular, the I/NI call filter outperforms variance-based filters on data with rare events like the CNVs in the HapMap data set. We assessed the efficiency of the I/NI call filter in reducing the FDR on two different cancer cell lines where it reduced the FDR 18- to 22-fold.Materials and Methods:?A mitigation strategy for too high FDRs is to filter out putative false detections. We suggest using probabilistic latent variable models to identify putative false detections which may be found via such models by high estimated noise or by model-based measurement inconsistencies across samples. To select such a model, a Bayesian approach starts with the maximum a priori model that assumes no detection and selects the maximum a posteriori model. Hence detection results in a deviation of the maximal posterior from the maximal prior model measured by the information gain obtained by the data. If this information gain exceeds a threshold then the selected model obtains an Informative/Non-Informative (I/NI) call that indicates a detection. I/NI call filtering has been successfully applied in different projects, but it has so far not been shown that correction for multiple testing after I/NI call filtering still controls the type-I error rate. We prove this important property of the I/NI call and show that it is independent of commonly used test statistics for null hypotheses. We apply the I/NI call to transcriptomics (gene expression), where the prior model corresponds to a constant gene expression level across compared samples, and to genomics, analyzing copy number variation (CNV) data, where the prior model corresponds to a constant DNA copy number of 2 across compared samples.
机译:动机:当前的临床和生物学研究应用了不同的生物技术,随后将所得的组学数据相结合以检验生物学假设。大量的组学数据及其组合产生了大量的假设,显然增加了研究能力。与这些期望相反,大量的组学数据甚至可能会降低研究的统计能力,因为多重测试的校正系数很大。通常,在组学数据分析中这种能力的丧失是由测量中错误检测率(FDR)的提高引起的,例如错误检测的DNA拷贝数变化或错误识别的差异表达基因。错误检测是随机的,因此与测试条件无关。因此,高FDR会大大降低研究的发现能力,尤其是在涉及不同的组学数据的情况下。结果:在HapMap数据集上,必须重新检测已知的CNV,I / NI调用过滤比基于方差的过滤要有效得多。特别是,对于具有罕见事件(例如HapMap数据集中的CNV)的数据,I / NI调用过滤器的性能优于基于方差的过滤器。我们评估了I / NI呼叫过滤器在降低两种不同癌细胞系FDR的效率上的效率,在这种情况下,FDR降低了18到22倍。错误的检测。我们建议使用概率潜在变量模型来识别假定的错误检测,这些错误检测可能是通过此类模型通过高估计噪声或样本之间基于模型的测量不一致来发现的。为了选择这样的模型,贝叶斯方法从假定没有检测到的最大先验模型开始,并选择最大后验模型。因此,检测导致最大后验与最大先验模型之间的偏差,该最大后验模型通过数据获得的信息增益来测量。如果此信息增益超过阈值,则所选模型将获得指示检测的信息/非信息(I / NI)调用。 I / NI呼叫过滤已成功应用于不同的项目中,但到目前为止,尚未显示I / NI呼叫过滤后对多个测试的更正仍然可以控制I型错误率。我们证明了I / NI调用的这一重要属性,并证明了它独立于零假设的常用检验统计量。我们将I / NI调用应用于转录组学(基因表达)(其中先验模型对应于比较样本中恒定的基因表达水平)以及基因组学,分析拷贝数变异(CNV)数据,其中先验模型对应于常量比较样本中的DNA拷贝数为2。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号