首页> 外文期刊>Intelligent data analysis >Multiclass cancer diagnosis in microarray gene expression profile using mutual information and Support Vector Machine
【24h】

Multiclass cancer diagnosis in microarray gene expression profile using mutual information and Support Vector Machine

机译:基于互信息和支持向量机的微阵列基因表达谱中的多类癌症诊断

获取原文
获取原文并翻译 | 示例

摘要

Gene expression profiles have been used for Cancer Classification recently. In this work, the multi-SVM (Support Vector Machine) approach with a novel Gene selection method using Mutual Information (MI) is developed for multi-class classification in the cancer diagnosis area. The mutual information between genes and class label is computed and used for identifying the discriminating genes in each category. All the genes are assigned rank based on their mutual information value and the optimal number of genes with the highest values are chosen and fed into the classifier. The multi-SVM classifier constructs separate classifier for each class and the combined multi-class classifier assigns a tissue sample to the class with the highest support. The performance of the proposed Multiclass Support Vector Machine (mSVM) with Gene Selection using the mutual information approach is evaluated on four benchmark gene expression datasets for cancer diagnosis, namely, the Leukemia dataset, the Lymphoma dataset, the NCI60 dataset and the GCM dataset. The multi-SVM approach develops the most effective classifier in achieving an accurate cancer diagnosis by analyzing gene expression data and it outperforms other popular machine learning algorithm like k-Nearest Neighbor. From the simulation study it is observed that the proposed approach reduces the dimension of the input features by identifying the most discriminating gene subset for each category and improves the predictive accuracy for multi-class cancer.
机译:基因表达谱最近已用于癌症分类。在这项工作中,针对癌症诊断领域中的多类别分类,开发了一种采用互信息(MI)的具有新颖基因选择方法的multi-SVM(支持向量机)方法。计算基因和类别标签之间的相互信息,并将其用于识别每个类别中的区分基因。根据所有基因的互信息值对所有基因进行排名,并选择具有最高值的最优基因数并将其输入分类器。多SVM分类器为每个类别构造单独的分类器,组合的多类别分类器将组织样本分配给具有最高支持的类别。在四个用于癌症诊断的基准基因表达数据集,即白血病数据集,淋巴瘤数据集,NCI60数据集和GCM数据集上,评估了使用互信息方法拟议的具有基因选择功能的多类支持向量机(mSVM)的性能。多重支持向量机方法通过分析基因表达数据开发出最有效的分类器,可以实现准确的癌症诊断,并且优于其他流行的机器学习算法,例如k-最近邻居。从仿真研究中可以看出,所提出的方法通过识别每个类别中最能区分的基因子集来减小输入特征的尺寸,并提高了多类癌症的预测准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号