...
首页> 外文期刊>Artificial intelligence in medicine >Hybrid genetic algorithm-neural network: Feature extraction for unpreprocessed microarray data
【24h】

Hybrid genetic algorithm-neural network: Feature extraction for unpreprocessed microarray data

机译:混合遗传算法-神经网络:未经预处理的微阵列数据的特征提取

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Objective: Suitable techniques for microarray analysis have been widely researched, particularly for the study of marker genes expressed to a specific type of cancer. Most of the machine learning methods that have been applied to significant gene selection focus on the classification ability rather than the selection ability of the method. These methods also require the microarray data to be preprocessed before analysis takes place. The objective of this study is to develop a hybrid genetic algorithm-neural network (GANN) model that emphasises feature selection and can operate on unpreprocessed microarray data. Method: The GANN is a hybrid model where the fitness value of the genetic algorithm (GA) is based upon the number of samples correctly labelled by a standard feedforward artificial neural network (ANN). The model is evaluated by using two benchmark microarray datasets with different array platforms and differing number of classes (a 2-class oligonucleotide microarray data for acute leukaemia and a 4-class complementary DNA (cDNA) microarray dataset for SRBCTs (small round blue cell tumours)). The underlying concept of the GANN algorithm is to select highly informative genes by co-evolving both the GA fitness function and the ANN weights at the same time.Results: The novel GANN selected approximately 50% of the same genes as the original studies. This may indicate that these common genes are more biologically significant than other genes in the datasets. The remaining 50% of the significant genes identified were used to build predictive models and for both datasets, the models based on the set of genes extracted by the GANN method produced more accurate results. The results also suggest that the GANN method not only can detect genes that are exclusively associated with a single cancer type but can also explore the genes that are differentially expressed in multiple cancer types.Conclusions: The results show that the GANN model has successfully extracted statistically significant genes from the unpreprocessed microarray data as well as extracting known biologically significant genes. We also show that assessing the biological significance of genes based on classification accuracy may be misleading and though the GANN's set of extra genes prove to be more statistically significant than those selected by other methods, a biological assessment of these genes is highly recommended to confirm their functionality.
机译:目的:已经广泛研究了适用于微阵列分析的技术,尤其是用于研究表达特定类型癌症的标志物基因的技术。应用于重要基因选择的大多数机器学习方法都侧重于分类能力,而不是方法的选择能力。这些方法还需要在分析之前对微阵列数据进行预处理。这项研究的目的是开发一种强调遗传特征选择并且可以对未处理的微阵列数据进行操作的混合遗传算法-神经网络(GANN)模型。方法:GANN是一种混合模型,其中遗传算法(GA)的适应度值基于标准前馈人工神经网络(ANN)正确标记的样本数。通过使用具有不同阵列平台和不同类别数的两个基准微阵列数据集(用于急性白血病的2类寡核苷酸微阵列数据和SRBCT(小圆形蓝细胞肿瘤)的4类互补DNA(cDNA)微阵列数据集,对模型进行评估。 ))。 GANN算法的基本概念是通过同时进化GA适应度函数和ANN权重来选择信息量高的基因。结果:新颖的GANN选择了大约50%的相同基因作为原始研究。这可能表明这些共同基因比数据集中的其他基因具有更大的生物学意义。剩下的50%识别出的重要基因被用于建立预测模型,并且对于两个数据集,基于通过GANN方法提取的一组基因的模型产生了更为准确的结果。结果还表明,GANN方法不仅可以检测与单一癌症类型专门相关的基因,而且还可以探索在多种癌症类型中差异表达的基因。结论:结果表明,GANN模型已成功地统计提取从未经预处理的微阵列数据中提取重要基因,以及提取已知的生物学重要基因。我们还表明,基于分类准确度评估基因的生物学意义可能会产生误导,尽管GANN的额外基因组被证明比其他方法选择的具有更高的统计学意义,但强烈建议对这些基因进行生物学评估以确认其基因功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号