首页> 外文会议>International Conference on Computing and Communications Technologies >A Scheme for Feature Selection from Gene Expression Data using Recursive Feature Elimination with Cross Validation and Unsupervised Deep Belief Network Classifier
【24h】

A Scheme for Feature Selection from Gene Expression Data using Recursive Feature Elimination with Cross Validation and Unsupervised Deep Belief Network Classifier

机译:使用递归特征消除具有交叉验证和无监督的深度信仰网络分类器的基因表达数据的特征选择方案

获取原文

摘要

In the treatment of cancers, the efficacy depends on the correct diagnosis of the nature of tumor as early as possible. Micro-array Gene expression data which contains the expression profiles of entire genome provides a source which can be analyzed to identify bio-markers of cancers. Micro-array data has a large number of features and very few number of samples. To make effective use of this data, it is very beneficial to select a reduced number of genes which can be used for tasks like classification. In this paper, we propose a two level scheme for feature selection and classification of cancers. First, the genes are ranked using Recursive Feature Elimination which uses Random Forest Classifier for evaluation of fitness of genes with five fold cross-validation , later these genes are used to pre-train an Unsupervised Deep Belief Network Classifier to classify the samples based on the selected genes. We compared the results in terms of cross validation matrix parameters viz. classification accuracy, precision and recall, obtained from our approach with the results obtained by using some standard feature selector-classifier combinations viz. Mutual Information with Support Vector Machines, Kernel Principal Component Analysis with Support Vector Machine, Support Vector Machine -Recursive Feature Elimination and Mutual Information with Random Forest Classifier. The results show that our scheme performs at par with standard methods used for feature selection from gene expression data.
机译:在治疗癌症时,疗效取决于尽早对肿瘤性质的正确诊断。含有全基因组表达谱的微阵列基因表达数据提供了可以分析的来源以鉴定癌症的生物标记。微阵列数据具有大量功能和非常少的样本。为了有效地利用这种数据,选择减少数量的基因是非常有益的,这些基因可以用于分类等任务。本文提出了两级方案,用于癌症的特征选择和分类。首先,使用递归特征消除对基因进行排序,该特征消除,该特征消除,其使用随机森林分类器进行评估,以评估具有五倍交叉验证的基因的适应性,后来这些基因用于预测无监督的深度信念网络分类器,以基于该基于选定的基因。我们将结果与交叉验证矩阵参数viz进行了比较。分类准确性,精度和召回,从我们的方法获得,通过使用一些标准特征选择器分类器组合VIZ获得的结果。使用支持向量机的相互信息,内核主成分分析与支持向量机,支持向量机 - 额外的功能消除和随机林类分类器的相互信息。结果表明,我们的方案以标准方法对来自基因表达数据的特征选择的标准方法进行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号