首页> 外文期刊>Bioinformatics >Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules
【24h】

Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules

机译:基于基因表达谱和功能模块,替换不可靠的cDNA微阵列测量结果对疾病分类的影响

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Microarrays datasets frequently contain a large number of missing values (MVs), which need to be estimated and replaced for subsequent data mining. The focus of the paper is to study the effects of different MV treatments for cDNA microarray data on disease classification analysis. Results: By analyzing five datasets, we demonstrate that among three kinds of classifiers evaluated in this study, support vector machine (SVM) classifiers are robust to varied MV imputation methods [e.g. replacing MVs by zero, K nearest-neighbor (KNN) imputation algorithm, local least square imputation and Bayesian principal component analysis], while the classification and regression tree classifiers are sensitive in terms of classification accuracy. The KNNclassifiers built on differentially expressed genes (DEGs) are robust to the varied MV treatments, but the performances of the KNN classifiers based on all measured genes can be significantly deteriorated when imputing MVs for genes with larger missing rate (MR) (e.g. MR > 5%). Generally, while replacing MVs by zero performs relatively poor, the other imputation algorithms have little difference in affecting classification performances of the SVM or KNN classifiers. We further demonstrate the power and feasibility of our recently proposed functional expression profile (FEP) approach as means to handle microarray data with MVs. The FEPs, which are derived from the functional modules that are enriched with sets of DEGs and thus can be consistently identified under varied MV treatments, achieve precise disease classification with better biological interpretation. We conclude that the choice of MV treatments should be determined in context of the later approaches used for disease classification. The suggested exclusion criterion of ignoring the genes with larger MR (e.g. > 5%), while justifiable for some classifiers such as KNN classifiers, might not be considered as a general rule for all classifiers.
机译:动机:微阵列数据集经常包含大量缺失值(MV),需要对其进行估计和替换以进行后续数据挖掘。本文的重点是研究针对cDNA微阵列数据的不同MV处理对疾病分类分析的影响。结果:通过分析五个数据集,我们证明了在这项研究中评估的三种分类器中,支持向量机(SVM)分类器对于各种MV插补方法具有鲁棒性。用零替换MV,K最近邻(KNN)插补算法,局部最小二乘插补和贝叶斯主成分分析],而分类树和回归树分类器在分类准确性方面比较敏感。基于差异表达基因(DEG)的KNN分类器对各种MV处理均具有较强的鲁棒性,但是当为缺失率(MR)大的基因估算MV时,基于所有测得基因的KNN分类器的性能可能会大大降低(例如MR> 5%)。通常,虽然将MV替换为零的效果相对较差,但其他归因算法在影响SVM或KNN分类器的分类性能方面几乎没有区别。我们进一步证明了我们最近提出的功能性表达谱(FEP)方法作为处理带有MV的微阵列数据的手段的功能和可行性。 FEP源自丰富的DEG组的功能模块,因此可以在各种MV治疗中得到一致的鉴定,它们可以通过更好的生物学解释实现精确的疾病分类。我们得出结论,应根据用于疾病分类的后期方法确定MV治疗的选择。建议的排除标准,即忽略具有较大MR(例如> 5%)的基因,尽管对于某些分类器(例如KNN分类器)是合理的,但可能不会被视为所有分类器的一般规则。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号