首页> 外文期刊>Expert Systems with Application >A Nested Genetic Algorithm for feature selection in high-dimensional cancer Microarray datasets
【24h】

A Nested Genetic Algorithm for feature selection in high-dimensional cancer Microarray datasets

机译:高维癌症微阵列数据集特征选择的嵌套遗传算法

获取原文
获取原文并翻译 | 示例
           

摘要

Cancer is a dangerous disease that causes death worldwide. Discovering few genes relevant to one cancer disease can result in effective treatments. The challenge associated with the Microarray datasets is its high dimensionality; the huge number of features compared to the modest number of samples in these datasets. Recent research efforts attempted to reduce this high-dimensionality using different feature selection techniques. This paper presents an ensemble feature selection technique based on t-test and genetic algorithm. After preprocessing the data using t-test, a Nested Genetic Algorithm, namely Nested-GA, is used to get the optimal subset of features by combining data from two different datasets. Nested-GA consists of two Nested Genetic Algorithms (outer and inner) that run on two different kinds of datasets. The Outer Genetic Algorithm (OGA-SVM) works on Microarray gene expression datasets, whereas the Inner Genetic Algorithm (IGA-NNW) runs on DNA Methylation datasets. Nested-GA is performed on a colon cancer dataset with 5-fold cross validation. After applying Nested-GA, the Incremental Feature Selection (IFS) strategy is used to get the smallest optimal genes subset. The genes subset has been validated on an independent dataset resulting in 99.9% classification accuracy. Consequently, the biological significance of the resulting optimal genes is validated using Enrichment Analysis. Moreover, the results of Nested-GA have been compared to the results of other feature selection algorithms that have been run on either Gene Expression or DNA Methylation datasets. From the experimental results, Nested-GA showed the highest classification performance with a small optimal feature subset compared to the other algorithms. Furthermore, by running Nested-GA on lung cancer datasets that contain two different cancer subtypes, it resulted in significantly better classification accuracy (98.4%) compared to the accuracy of a previous research (84.6%) that utilized lung cancer DNA-Methylation data only. (C) 2018 Elsevier Ltd. All rights reserved.
机译:癌症是导致全世界死亡的危险疾病。发现与一种癌症疾病相关的很少基因可以导致有效的治疗。与微阵列数据集相关的挑战是其高维性。与这些数据集中少量样本相比,特征数量巨大。最近的研究努力尝试使用不同的特征选择技术来降低这种高维性。本文提出了一种基于t检验和遗传算法的集成特征选择技术。在使用t检验对数据进行预处理之后,通过组合两个不同数据集中的数据,使用嵌套遗传算法(即Nested-GA)来获得特征的最佳子集。 Nested-GA由两个嵌套遗传算法(外部和内部)组成,它们在两种不同的数据集上运行。外遗传算法(OGA-SVM)在微阵列基因表达数据集上运行,而内遗传算法(IGA-NNW)在DNA甲基化数据集上运行。 Nested-GA在结肠癌数据集上进行5倍交叉验证。应用Nested-GA之后,使用增量特征选择(IFS)策略来获得最小的最佳基因子集。基因子集已经在独立的数据集中进行了验证,分类精度为99.9%。因此,使用富集分析验证了所得最佳基因的生物学意义。此外,Nested-GA的结果已与在基因表达或DNA甲基化数据集上运行的其他特征选择算法的结果进行了比较。从实验结果来看,与其他算法相比,Nested-GA表现出最高的分类性能,并且具有较小的最佳特征子集。此外,通过对包含两种不同癌症亚型的肺癌数据集运行Nested-GA,与仅利用肺癌DNA甲基化数据的先前研究的准确度(84.6%)相比,其分类准确度(98.4%)显着提高。 (C)2018 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号