首页> 外文期刊>BMC Bioinformatics >Recursive Cluster Elimination (RCE) for classification and feature selection from gene expression data
【24h】

Recursive Cluster Elimination (RCE) for classification and feature selection from gene expression data

机译:来自基因表达数据的分类和特征选择的递归集群消除(RCE)

获取原文
           

摘要

Background Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE) rather than recursive feature elimination (RFE). We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE. Results We have developed a novel method for selecting significant genes in comparative gene expression studies. This method, which we refer to as SVM-RCE, combines K-means, a clustering method, to identify correlated gene clusters, and Support Vector Machines (SVMs), a supervised machine learning classification method, to identify and score (rank) those gene clusters for the purpose of classification. K-means is used initially to group genes into clusters. Recursive cluster elimination (RCE) is then applied to iteratively remove those clusters of genes that contribute the least to the classification performance. SVM-RCE identifies the clusters of correlated genes that are most significantly differentially expressed between the sample classes. Utilization of gene clusters, rather than individual genes, enhances the supervised classification accuracy of the same data as compared to the accuracy when either SVM or Penalized Discriminant Analysis (PDA) with recursive feature elimination (SVM-RFE and PDA-RFE) are used to remove genes based on their individual discriminant weights. Conclusion SVM-RCE provides improved classification accuracy with complex microarray data sets when it is compared to the classification accuracy of the same datasets using either SVM-RFE or PDA-RFE. SVM-RCE identifies clusters of correlated genes that when considered together provide greater insight into the structure of the microarray data. Clustering genes for classification appears to result in some concomitant clustering of samples into subgroups. Our present implementation of SVM-RCE groups genes using the correlation metric. The success of the SVM-RCE method in classification suggests that gene interaction networks or other biologically relevant metrics that group genes based on functional parameters might also be useful.
机译:背景技术使用基因表达数据集的分类研究通常基于少量的样品和数万个基因。在比较不同样本类别的那些重要的基因的选择构成了高维数据分析中的具有挑战性的问题。我们描述了选择重大基因作为递归集群消除(RCE)而不是递归特征消除(RFE)的新程序。我们在六个数据集中测试了该算法,并将其性能与RFE的两个相关分类程序进行了比较。结果我们开发了一种在比较基因表达研究中选择重要基因的新方法。我们将其称为SVM-RCE的方法,组合K-Means,一种聚类方法,识别相关的基因集群,以及支持向量机(SVM),监督机器学习分类方法,以识别和得分(等级)基因集群为分类。 k-means最初用于将基因分成簇。然后应用递归集群消除(RCE)以迭代地删除那些为分类性能的最少贡献的基因簇。 SVM-RCE识别在样品类别之间最显着表达的相关基因的簇。与递归特征消除(SVM-RFE和PDA-RFE)的SVM或惩罚判别分析(PDA)相比,基因簇的利用率增强了相同数据的监督分类准确性(PDA)的准确性(SVM-RFE和PDA-RFE)基于各自的判别重量去除基因。结论SVM-RCE在使用SVM-RFE或PDA-RFE的比较时,SVM-RCE提供复杂的微阵列数据集的分类准确性。 SVM-RCE识别相关基因的集群,当考虑在一起时提供更大的洞察微阵列数据的结构。分类的聚类基因似乎导致一些伴随的样品聚类为子组。我们使用相关度量的SVM-RCE组基因的应用。 SVM-RCE方法在分类中的成功表明,基因交互网络或基于功能参数的基因基因的其他生物相关度量也可能是有用的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号