首页> 外文期刊>Statistical Applications in Genetics and Molecular Biology >Hypothesis Tests for Point-Mass Mixture Data with Application to `Omics Data with Many Zero Values
【24h】

Hypothesis Tests for Point-Mass Mixture Data with Application to `Omics Data with Many Zero Values

机译:点质混合数据的假设检验及其在具有许多零值的组数据中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

Data composed of a continuous component plus a point-mass frequently arises in genomicnstudies. The distribution of this type of data is characterized by the proportion of observationsnin the point mass and the distribution of the continuous component. Standard statistical methodsnfocus on one of these effects at a time and can fail to detect differences between experimentalngroups. We propose a novel empirical likelihood ratio test (LRT) statistic for simultaneously test-ning the null hypothesis of no difference in point-mass proportions and no difference in means ofnthe continuous component. This study evaluates the performance of the empirical LRT and threenexisting point-mass mixture statistics: 1) Two-part statistic with a t-test for testing mean differ-nences (Two-part t), 2) Two-part statistic withWilcoxon test for testing mean differences (Two-partnW), and 3) parametric LRT.nOur investigations begin with an analysis of metabolomics data from Arabidopsis thaliana, whichncontains many metabolites with a large proportion of observed concentrations in a point-mass atnzero. All four point-mass mixture statistics identify more significant differences than standardnt-tests and Wilcoxon tests. The empirical LRT appears particularly effective. These findings mo-ntivate a large simulation study that assesses Type I and Type II error of the four test statisticsnwith various choices of null distribution. The parametric LRT is frequently the most powerfulntest, as long as the model assumptions are correct. As is common in ‘omics data, the Arabidop-nsis metabolites have widely varying concentration distributions. A single parametric distributionncannot effectively represent all of these distributions, and individually selecting the optimal para-nmetric distribution to use in the LRT for each metabolite is not practical. The empirical LRT,nwhich does not require parametric assumptions, provides an attractive alternative to parametricnand standard methods.
机译:在基因组研究中经常出现由连续成分加点质量组成的数据。这种类型的数据的分布特征在于观察点在点质量中的比例和连续分量的分布。标准的统计方法一次只能关注这些影响之一,并且可能无法检测到实验组之间的差异。我们提出了一种新颖的经验似然比检验(LRT)统计量,用于同时检验点质量比例无差异且连续分量均值无差异的原假设。这项研究评估了经验LRT和三点混合质点统计的性能:1)两部分统计采用t检验来检验均数差(两部分t),2)两部分统计采用Wilcoxon检验来检验我们的研究从分析拟南芥(Arabidopsis thaliana)的代谢组学数据开始,该代谢组学包含许多代谢物,其观察到的浓度在零质量点处为零。所有四个点质量混合统计数据比标准检验和Wilcoxon检验显示出更大的差异。经验轻轨似乎特别有效。这些发现激发了一个大型的仿真研究,该仿真研究评估了四种检验统计量n的I型和II型误差,并选择了各种零分布。只要模型假设正确,参数LRT通常是最强大的测试。正如'组学数据中常见的一样,拟南芥代谢产物具有广泛不同的浓度分布。单个参数分布不能有效地代表所有这些分布,并且单独选择用于每种代谢物的LRT中使用的最佳参数分布是不切实际的。不需要参数假设的经验LRT提供了一种有吸引力的替代参数和标准方法的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号