首页> 外文期刊>Bioinformatics >Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data
【24h】

Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data

机译:附带缺失值插补:一种新的鲁棒缺失值估计算法,用于微阵列数据

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Microarray data are used in a range of application areas in biology, although often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible before using these algorithms. While many imputation algorithms have been proposed, more robust techniques need to be developed so that further analysis of biological data can be accurately undertaken. In this paper, an innovative missing value imputation algorithm called collateral missing value estimation (CMVE) is presented which uses multiple covariance-based imputation matrices for the final prediction of missing values. The matrices are computed and optimized using least square regression and linear programming methods.Results: The new CMVE algorithm has been compared with existing estimation techniques including Bayesian principal component analysis imputation (BPCA), least square impute (LSImpute) and K-nearest neighbour (KNN). All these methods were rigorously tested to estimate missing values in three separate non-time series (ovarian cancer based) and one time series (yeast sporulation) dataset. Each method was quantitatively analyzed using the normalized root mean square (NRMS) error measure, covering a wide range of randomly introduced missing value probabilities from 0.01 to 0.2. Experiments were also undertaken on the yeast dataset, which comprised 1.7% actual missing values, to test the hypothesis that CMVE performed better not only for randomly occurring but also for a real distribution of missing values. The results confirmed that CMVE consistently demonstrated superior and robust estimation capability of missing values compared with other methods for both series types of data, for the same order of computational complexity. A concise theoretical framework has also been formulated to validate the improved performance of the CMVE algorithm.
机译:动机:尽管微阵列数据通常包含相当多的缺失值,但它们已在生物学的许多应用领域中使用。这些丢失的值可能会严重影响后续的统计分析和机器学习算法,因此强烈希望在使用这些算法之前尽可能准确地估计这些值。尽管已经提出了许多插补算法,但是需要开发更健壮的技术,以便可以准确地进行生物数据的进一步分析。在本文中,提出了一种创新的缺失值插补算法,称为抵押缺失值估计(CMVE),该算法使用多个基于协方差的插补矩阵对缺失值进行最终预测。结果:将新的CMVE算法与现有的估算技术进行了比较,包括贝叶斯主成分分析估算(BPCA),最小平方估算(LSImpute)和K近邻( KNN)。所有这些方法均经过严格测试,以估计三个单独的非时间序列(基于卵巢癌)和一个时间序列(酵母菌形成)数据集中的缺失值。使用归一化均方根(NRMS)误差度量对每种方法进行了定量分析,涵盖了从0.01到0.2的各种随机引入的缺失值概率。还对包含1.7%实际缺失值的酵母数据集进行了实验,以检验CMVE不仅对于随机出现的情况而且对于缺失值的实际分布都表现更好的假设。结果证实,对于相同系列的计算复杂度,对于两种系列的数据,CMVE始终证明了与其他方法相比,缺失值的优越而强大的估计能力。简洁的理论框架也已经制定出来,以验证CMVE算法的改进性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号