首页> 外文期刊>Bioinformatics >DNA microarray data imputation and significance analysis of differential expression
【24h】

DNA microarray data imputation and significance analysis of differential expression

机译:DNA微阵列数据归因及差异表达意义分析

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Significance analysis of differential expression in DNA microarray data is an important task. Much of the current research is focused on developing improved tests and software tools. The task is difficult not only owing to the high dimensionality of the data (number of genes), but also because of the often non-negligible presence of missing values. There is thus a great need to reliably impute these missing values prior to the statistical analyses. Many imputation methods have been developed for DNA microarray data, but their impact on statistical analyses has not been well studied. In this work we examine how missing values and their imputation affect significance analysis of differential expression. Results: We develop a new imputation method (LinCmb) that is superior to the widely used methods in terms of normalized root mean squared error. Its estimates are the convex combinations of the estimates of existing methods. We find that LinCmb adapts to the structure of the data: If the data are heterogeneous or if there are few missing values, LinCmb puts more weight on local imputation methods; if the data are homogeneous or if there are many missing values, LinCmb puts more weight on global imputation methods. Thus, LinCmb is a useful tool to understand the merits of different imputation methods. We also demonstrate that missing values affect significance analysis. Two datasets, different amounts of missing values, different imputation methods, the standard t-test and the regularized t-test and ANOVA are employed in the simulations. We conclude that good imputation alleviates the impact of missing values and should be an integral part of microarray data analysis. The most competitive methods are LinCmb, GMC and BPCA. Popular imputation schemes such as SVD, row mean, and KNN all exhibit high variance and poor performance. The regularized t-test is less affected by missing values than the standard t-test.
机译:动机:DNA微阵列数据中差异表达的意义分析是一项重要任务。当前的许多研究都集中在开发改进的测试和软件工具上。这项任务之所以困难,不仅是因为数据的维数高(基因数),而且由于缺失值的存在常常是不可忽略的。因此,非常需要在统计分析之前可靠地估算这些缺失值。已经为DNA微阵列数据开发了许多插补方法,但对统计分析的影响尚未得到很好的研究。在这项工作中,我们研究了缺失值及其归因如何影响差异表达的重要性分析。结果:我们开发了一种新的归因方法(LinCmb),在归一化均方根误差方面,该方法优于广泛使用的方法。它的估计是现有方法的估计的凸组合。我们发现LinCmb适应数据的结构:如果数据是异构的,或者缺少的值很少,LinCmb将更多的精力放在局部插补方法上。如果数据是同质的或缺少许多值,则LinCmb会更重视全局插补方法。因此,LinCmb是了解不同插补方法优点的有用工具。我们还证明了缺失值会影响重要性分析。模拟中使用了两个数据集,不同数量的缺失值,不同的插补方法,标准t检验和正则t检验以及ANOVA。我们得出的结论是,良好的插补可减轻缺失值的影响,应成为微阵列数据分析不可或缺的一部分。最具竞争力的方法是LinCmb,GMC和BPCA。常用的插补方案(例如SVD,行均值和KNN)都表现出高方差和较差的性能。与标准t检验相比,正规化t检验受缺失值的影响较小。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号