首页> 外文期刊>Italian Journal of Public Health >Parametric and nonparametric two-sample tests for feature screening in class comparison: a simulation study
【24h】

Parametric and nonparametric two-sample tests for feature screening in class comparison: a simulation study

机译:用于类比较中的特征筛选的参数和非参数两个样本测试:模拟研究

获取原文
           

摘要

Background. The identification of a location-, scale- and shape-sensitive test to detect differentially expressed features between two comparison groups represents a key point in high dimensional studies. The most commonly used tests refer to differences in location, but general distributional discrepancies might be important to reveal differential biological processes. Methods. A simulation study was conducted to compare the performance of a set of two-sample tests, i.e. Student's t, Welch's t, Wilcoxon-Mann-Whitney, Podgor-Gastwirth PG2, Cucconi, Kolmogorov-Smirnov (KS), Cramer-von Mises (CvM), Anderson-Darling (AD) and Zhang tests (Z K, Z C and Z A ) which were investigated under different distributional patterns. We applied the same tests to a real data example. Results. AD, CvM, Z A and Z C tests proved to be the most sensitive tests in mixture distribution patterns, while still maintaining a high power in normal distribution patterns. At best, the AD test showed a loss in power of ~ 2% in the comparison of two normal distributions, but a gain of ~ 32% with mixture distributions respect to the parametric tests. Accordingly, the AD test detected the greatest number of differentially expressed features in the real data application. Conclusion. The tests for the general two-sample problem introduce a more general concept of 'differential expression', thus overcoming the limitations of the other tests restricted to specific moments of the feature distributions. In particular, the AD test should be considered as a powerful alternative to the parametric tests for feature screening in order to keep as many discriminative features as possible for the class prediction analysis.
机译:背景。确定位置,比例和形状敏感测试以检测两个比较组之间差异表达的特征是高维研究的关键。最常用的测试是指位置的差异,但是一般的分布差异对于揭示差异的生物过程可能很重要。方法。进行了模拟研究,比较了两个样本测试的性能,即学生的t,韦尔奇的t,Wilcoxon-Mann-Whitney,Podgor-Gastwirth PG2,Cucconi,Kolmogorov-Smirnov(KS),Cramer-von Mises( CvM),Anderson-Darling(AD)和Zhang检验(ZK,ZC和ZA)在不同的分布模式下进行了研究。我们将相同的测试应用于实际数据示例。结果。在混合分布模式中,AD,CvM,Z A和Z C测试被证明是最敏感的测试,而在正态分布模式中仍保持着高功率。最好的情况是,AD测试显示,与两个正态分布相比,功率损失约为2%,但是与参量测试相比,混合分布的功率损失约为〜32%。因此,AD测试在实际数据应用程序中检测到最多数量的差异表达特征。结论。针对一般两样本问题的测试引入了“差分表达式”的更一般概念,从而克服了其他测试仅限于特征分布的特定时刻的局限性。特别是,AD测试应被认为是用于特征筛选的参数测试的有力替代方案,以便为类预测分析保留尽可能多的区分性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号