首页> 外文期刊>BMC Bioinformatics >Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme
【24h】

Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme

机译:通过支持向量回归借调和正交编码方案缺少DNA微阵列基因表达数据的值估计

获取原文
           

摘要

Background Gene expression profiling has become a useful biological resource in recent years, and it plays an important role in a broad range of areas in biology. The raw gene expression data, usually in the form of large matrix, may contain missing values. The downstream analysis methods that postulate complete matrix input are thus not applicable. Several methods have been developed to solve this problem, such as K nearest neighbor impute method, Bayesian principal components analysis impute method, etc . In this paper, we introduce a novel imputing approach based on the Support Vector Regression (SVR) method. The proposed approach utilizes an orthogonal coding input scheme, which makes use of multi-missing values in one row of a certain gene expression profile and imputes the missing value into a much higher dimensional space, to obtain better performance. Results A comparative study of our method with the previously developed methods has been presented for the estimation of the missing values on six gene expression data sets. Among the three different input-vector coding schemes we tried, the orthogonal input coding scheme obtains the best estimation results with the minimum Normalized Root Mean Squared Error ( NRMSE ). The results also demonstrate that the SVR method has powerful estimation ability on different kinds of data sets with relatively small NRMSE . Conclusion The SVR impute method shows better performance than, or at least comparable with, the previously developed methods in present research. The outstanding estimation ability of this impute method is partly due to the use of the most missing value information by incorporating orthogonal input coding scheme. In addition, the solid theoretical foundation of SVR method also helps in estimation of performance together with orthogonal input coding scheme. The promising estimation ability demonstrated in the results section suggests that the proposed approach provides a proper solution to the missing value estimation problem. The source code of the SVR method is available from http://202.38.78.189/downloads/svrimpute.html for non-commercial use.
机译:背景技术近年来,基因表达分析已成为一种有用的生物资源,它在生物学广泛的地区起着重要作用。通常以大矩阵形式的原始基因表达数据可能包含缺失值。因此,假设完整矩阵输入的下游分析方法是不适用的。已经开发了几种方法来解决这个问题,例如K最近邻居赋予方法,贝叶斯主成分分析赋予赋予方法等。在本文中,我们介绍了一种基于支持向量回归(SVR)方法的新颖算法方法。所提出的方法利用正交编码输入方案,这在某个基因表达分布的一行中利用多缺失值,并将缺失值施加到高大的尺寸空间中,以获得更好的性能。结果我们对先前开发的方法的方法进行了比较研究,以估计六个基因表达数据集上的缺失值。在我们尝试的三种不同的输入矢量编码方案中,正交输入编码方案获得了最小归一化均方均方向(NRMSE)的最佳估计结果。结果还表明,SVR方法在具有相对较小的NRMSE的数据集上具有强大的估计能力。结论SVR施加方法显示出比目前研究中先前开发的方法更好的性能,或者至少可比较。这种赋予方法的出色估计能力部分是通过结合正交输入编码方案使用最缺失的值信息。此外,SVR方法的固体理论基础还有助于与正交输入编码方案一起估计性能。在结果部分中展示的有希望的估计能力表明,所提出的方法为缺失的值估计问题提供了适当的解决方案。 SVR方法的源代码可从http://202.38.78.189/downloads/svrimpute.html获取非商业用途。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号