首页> 外文会议>2011 1st Middle East Conference on Biomedical Engineering >Evaluation of missing values imputation methods in cDNA microarrays based on classification accuracy
【24h】

Evaluation of missing values imputation methods in cDNA microarrays based on classification accuracy

机译:基于分类准确性的cDNA芯片缺失值插补方法评估

获取原文

摘要

Many attempts have been carried out to deal with missing values (MV) in microarrays data representing gene expressions. This is a problematic issue as many data analysis techniques are not robust to missing data. Most of the MV imputation methods currently being used have been evaluated only in terms of the similarity between the original and imputed data. While imputed expression values themselves are not interesting, rather whether or not the imputed expression values are reliable to use in subsequent analysis is the major concern. This paper focuses on studying the impact of different MV imputation methods on the classification accuracy. The experimental work was first subjected to implementing three popular imputation methods, namely Singular Value Decomposition (SVD), weighted K-nearest neighbors (KNNimpute), and Zero replacement. The robustness of the three methods to the amount of missing data was then studied. The experiments were repeated for datasets with different missing rates (MR) over the range of 0–20% MR. In applying supervised two class classification we adopted a twofold approach, introducing all genes expressions to the classifiers as well as a subset of selected genes. The feature selection method used for gene selection is Fisher Discriminate Analysis (FDA), which improved noticeably the performance of the classifiers. The retained classifiers accuracies using imputed data after applying the three proposed imputation methods show slight variations over the specified range of MR. Thus, assessing that the three imputation methods in concern are robust.
机译:已经进行了许多尝试来处理代表基因表达的微阵列数据中的缺失值(MV)。这是一个有问题的问题,因为许多数据分析技术对丢失的数据并不可靠。当前仅使用原始数据和估算数据之间的相似性来评估当前使用的大多数MV估算方法。尽管推定的表达值本身并不令人感兴趣,但是,推定的表达值是否可靠地用于后续分析是主要关注的问题。本文着重研究了不同的MV插补方法对分类精度的影响。实验工作首先要执行三种流行的归因方法,即奇异值分解(SVD),加权K最近邻(KNNimpute)和零替换。然后研究了这三种方法对丢失数据量的鲁棒性。对于0-20%MR范围内具有不同丢失率(MR)的数据集,重复进行该实验。在应用有监督的两类分类时,我们采用了双重方法,将所有基因表达引入分类器以及所选基因的子集。用于基因选择的特征选择方法是Fisher判别分析(FDA),该方法显着提高了分类器的性能。在应用三种建议的插补方法后,保留的分类器使用插补数据的准确性在MR的指定范围内显示出细微变化。因此,评估所关注的三种插补方法是可靠的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号