首页> 美国卫生研究院文献>Briefings in Bioinformatics >Effects of imputation on correlation: implications for analysis of mass spectrometry data from multiple biological matrices
【2h】

Effects of imputation on correlation: implications for analysis of mass spectrometry data from multiple biological matrices

机译:归因对相关性的影响:对来自多种生物基质的质谱数据分析的启示

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

With expanded access to, and decreased costs of, mass spectrometry, investigators are collecting and analyzing multiple biological matrices from the same subject such as serum, plasma, tissue and urine to enhance biomarker discoveries, understanding of disease processes and identification of therapeutic targets. Commonly, each biological matrix is analyzed separately, but multivariate methods such as MANOVAs that combine information from multiple biological matrices are potentially more powerful. However, mass spectrometric data typically contain large amounts of missing values, and imputation is often used to create complete data sets for analysis. The effects of imputation on multiple biological matrix analyses have not been studied. We investigated the effects of seven imputation methods (half minimum substitution, mean substitution, k-nearest neighbors, local least squares regression, Bayesian principal components analysis, singular value decomposition and random forest), on the within-subject correlation of compounds between biological matrices and its consequences on MANOVA results. Through analysis of three real omics data sets and simulation studies, we found the amount of missing data and imputation method to substantially change the between-matrix correlation structure. The magnitude of the correlations was generally reduced in imputed data sets, and this effect increased with the amount of missing data. Significant results from MANOVA testing also were substantially affected. In particular, the number of false positives increased with the level of missing data for all imputation methods. No one imputation method was universally the best, but the simple substitution methods (Half Minimum and Mean) consistently performed poorly.
机译:随着质谱分析方法的普及和成本降低,研究人员正在收集和分析来自同一受试者的多种生物基质,例如血清,血浆,组织和尿液,以增强生物标记物的发现,对疾病过程的了解和治疗靶标的识别。通常,每个生物基质都是分开分析的,但是将来自多个生物基质的信息相结合的多元方法(例如MANOVA)可能会更强大。但是,质谱数据通常包含大量的缺失值,并且估算通常用于创建用于分析的完整数据集。插补对多种生物基质分析的影响尚未研究。我们研究了七种估算方法(半数最小替换,均值替换,k最近邻,局部最小二乘回归,贝叶斯主成分分析,奇异值分解和随机森林)对生物矩阵之间化合物的对象间相关性的影响及其对MANOVA结果的影响。通过对三个实际的组学数据集的分析和仿真研究,我们发现丢失的数据量和归因方法会大大改变矩阵之间的相关结构。在估算的数据集中,相关性的大小通常会降低,并且这种影响会随着丢失数据的数量而增加。 MANOVA测试的重要结果也受到很大影响。尤其是,对于所有插补方法而言,假阳性的数量随缺失数据的水平而增加。没有一种插补方法在世界范围内是最好的,但是简单的替换方法(半最小值和均值)始终表现不佳。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号