Hypothesis Tests for Point-Mass Mixture Data with Application to `Omics Data with Many Zero Values

Sandra Taylor and Katherine Pollard

首页> 外文期刊>Statistical Applications in Genetics and Molecular Biology >Hypothesis Tests for Point-Mass Mixture Data with Application to `Omics Data with Many Zero Values

【24h】

Hypothesis Tests for Point-Mass Mixture Data with Application to `Omics Data with Many Zero Values

机译：点质混合数据的假设检验及其在具有许多零值的组数据中的应用

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data composed of a continuous component plus a point-mass frequently arises in genomicnstudies. The distribution of this type of data is characterized by the proportion of observationsnin the point mass and the distribution of the continuous component. Standard statistical methodsnfocus on one of these effects at a time and can fail to detect differences between experimentalngroups. We propose a novel empirical likelihood ratio test (LRT) statistic for simultaneously test-ning the null hypothesis of no difference in point-mass proportions and no difference in means ofnthe continuous component. This study evaluates the performance of the empirical LRT and threenexisting point-mass mixture statistics: 1) Two-part statistic with a t-test for testing mean differ-nences (Two-part t), 2) Two-part statistic withWilcoxon test for testing mean differences (Two-partnW), and 3) parametric LRT.nOur investigations begin with an analysis of metabolomics data from Arabidopsis thaliana, whichncontains many metabolites with a large proportion of observed concentrations in a point-mass atnzero. All four point-mass mixture statistics identify more signiﬁcant differences than standardnt-tests and Wilcoxon tests. The empirical LRT appears particularly effective. These ﬁndings mo-ntivate a large simulation study that assesses Type I and Type II error of the four test statisticsnwith various choices of null distribution. The parametric LRT is frequently the most powerfulntest, as long as the model assumptions are correct. As is common in ‘omics data, the Arabidop-nsis metabolites have widely varying concentration distributions. A single parametric distributionncannot effectively represent all of these distributions, and individually selecting the optimal para-nmetric distribution to use in the LRT for each metabolite is not practical. The empirical LRT,nwhich does not require parametric assumptions, provides an attractive alternative to parametricnand standard methods.

机译：在基因组研究中经常出现由连续成分加点质量组成的数据。这种类型的数据的分布特征在于观察点在点质量中的比例和连续分量的分布。标准的统计方法一次只能关注这些影响之一，并且可能无法检测到实验组之间的差异。我们提出了一种新颖的经验似然比检验（LRT）统计量，用于同时检验点质量比例无差异且连续分量均值无差异的原假设。这项研究评估了经验LRT和三点混合质点统计的性能：1）两部分统计采用t检验来检验均数差（两部分t），2）两部分统计采用Wilcoxon检验来检验我们的研究从分析拟南芥（Arabidopsis thaliana）的代谢组学数据开始，该代谢组学包含许多代谢物，其观察到的浓度在零质量点处为零。所有四个点质量混合统计数据比标准检验和Wilcoxon检验显示出更大的差异。经验轻轨似乎特别有效。这些发现激发了一个大型的仿真研究，该仿真研究评估了四种检验统计量n的I型和II型误差，并选择了各种零分布。只要模型假设正确，参数LRT通常是最强大的测试。正如'组学数据中常见的一样，拟南芥代谢产物具有广泛不同的浓度分布。单个参数分布不能有效地代表所有这些分布，并且单独选择用于每种代谢物的LRT中使用的最佳参数分布是不切实际的。不需要参数假设的经验LRT提供了一种有吸引力的替代参数和标准方法的方法。

著录项

来源
《Statistical Applications in Genetics and Molecular Biology》 |2009年第1期|p.1-45|共45页
作者
Sandra Taylor and Katherine Pollard;
展开▼
作者单位

∗University of California, Davis, staylor@wald.ucdavis.edu†University of California, San Francisco, katherine.pollard@gladstone.ucsf.edu;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
point-mass mixture, empirical likelihood, two-part statistics, likelihood ratio test,;

机译：点质量混合;经验似然;两部分统计;似然比检验;

相似文献

外文文献
中文文献
专利

1. Multiscale null hypothesis testing for network- valued data: Analysis of brain networks of patients with autism [J] . Ilenia Lovato, Alessia Pini, Aymeric Stamm, Journal of the royal statistical society . 2021,第Pta2期

机译：网络价值数据的MultiScale Null假设检测：自闭症患者脑网络分析
2. Large-scale simultaneous hypothesis testing in monitoring carbon content from French soil database - a semi-parametric mixture approach. [J] . Chauveau D., Saby N. P. A., Orton T. G., Geoderma: An International Journal of Soil Science . 2014,第Null期

机译：监测法国土壤数据库中碳含量的大规模同时假设检验-一种半参数混合方法。
3. Multiple hypothesis testing and clustering with mixtures of non-central t-distributions applied in microarray data analysis [J] . Marín J.M., Rodríguez-Bernal M.T. Computational statistics & data analysis . 2012,第6期

机译：多重假设检验和非中心t分布混合的聚类应用于微阵列数据分析
4. When can Multi-Site Datasets be Pooled for Regression? Hypothesis Tests, l_2-consistency and Neuroscience Applications [C] . Hao Henry Zhou, Yilin Zhang, Vamsi K. Ithapu, International Conference on Machine Learning . 2018

机译：何时可以汇集多站点数据集以进行回归？假设试验，L_2-一致性和神经科学应用
5. Statistical methods for analyzing 'omics data with emphasis on point-mass mixtures. [D] . Taylor, Sandra Lynn. 2009

机译：用于分析“组学数据”的统计方法，重点是点质混合。
6. A probit- log- skew-normal mixture model for repeated measures data with excess zeros with application to a cohort study of paediatric respiratory symptoms [O] . Sadia Mahmud, WY Wendy Lou, Neil W Johnston 2010

机译：用于重复测量数据的零概率偏斜正态-正态混合模型应用于小儿呼吸系统症状的队列研究
7. Inference of nonparametric hypothesis testing on high dimensional longitudinal data and its application in DNA copy number variation and micro array data analysis [O] . Zhang Ke 100

机译：高维纵向数据非参数假设检验的推断及其在DNa拷贝数变异和微阵列数据分析中的应用
8. Accumulate-Toward-the-Mode Approach to Confidence Intervals and Hypothesis Testing With Applications to Binomially Distributed Data [R] . Nuzman, D. W. 2010

机译：应用于二项分布数据的置信区间和假设检验的累积 - 模式方法

Hypothesis Tests for Point-Mass Mixture Data with Application to `Omics Data with Many Zero Values

摘要

著录项

相似文献

相关主题

期刊订阅