首页> 外文期刊>BMC Medical Genomics >A novel gene selection algorithm for cancer classification using microarray datasets
【24h】

A novel gene selection algorithm for cancer classification using microarray datasets

机译:使用微阵列数据集的新型癌症分类基因选择算法

获取原文
           

摘要

Microarray datasets are an important medical diagnostic tool as they represent the states of a cell at the molecular level. Available microarray datasets for classifying cancer types generally have a fairly small sample size compared to the large number of genes involved. This fact is known as a curse of dimensionality, which is a challenging problem. Gene selection is a promising approach that addresses this problem and plays an important role in the development of efficient cancer classification due to the fact that only a small number of genes are related to the classification problem. Gene selection addresses many problems in microarray datasets such as reducing the number of irrelevant and noisy genes, and selecting the most related genes to improve the classification results. An innovative Gene Selection Programming (GSP) method is proposed to select relevant genes for effective and efficient cancer classification. GSP is based on Gene Expression Programming (GEP) method with a new defined population initialization algorithm, a new fitness function definition, and improved mutation and recombination operators. . Support Vector Machine (SVM) with a linear kernel serves as a classifier of the GSP. Experimental results on ten microarray cancer datasets demonstrate that Gene Selection Programming (GSP) is effective and efficient in eliminating irrelevant and redundant genes/features from microarray datasets. The comprehensive evaluations and comparisons with other methods show that GSP gives a better compromise in terms of all three evaluation criteria, i.e., classification accuracy, number of selected genes, and computational cost. The gene set selected by GSP has shown its superior performances in cancer classification compared to those selected by the up-to-date representative gene selection methods. Gene subset selected by GSP can achieve a higher classification accuracy with less processing time.
机译:微阵列数据集是重要的医学诊断工具,因为它们在分子水平上代表细胞的状态。与涉及的大量基因相比,用于分类癌症类型的可用微阵列数据集通常具有相当小的样本量。这个事实被称为维数诅咒,这是一个具有挑战性的问题。基因选择是解决该问题的有前途的方法,并且由于只有少数基因与分类问题有关的事实而在有效的癌症分类的发展中起着重要作用。基因选择解决了微阵列数据集中的许多问题,例如减少无关和嘈杂的基因的数量,以及选择最相关的基因以改善分类结果。提出了一种创新的基因选择编程(GSP)方法,以选择相关基因以进行有效的癌症分类。 GSP基于基因表达编程(GEP)方法,具有新定义的种群初始化算法,新的适应度函数定义以及改进的突变和重组算子。 。具有线性核的支持向量机(SVM)用作GSP的分类器。在十个微阵列癌症数据集上的实验结果表明,基因选择编程(GSP)在消除微阵列数据集中无关和冗余的基因/特征方面是行之有效的。综合评估和与其他方法的比较表明,GSP在所有三个评估标准(即分类准确度,所选基因的数量和计算成本)方面都有较好的折衷。与通过最新代表性基因选择方法选择的基因组相比,通过GSP选择的基因组在癌症分类中已显示出卓越的性能。通过GSP选择的基因子集可以以更少的处理时间实现更高的分类精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号