首页> 外文期刊>Engineering Applications of Artificial Intelligence >Complex diseases SNP selection and classification by hybrid Association Rule Mining and Artificial Neural Network-based Evolutionary Algorithms
【24h】

Complex diseases SNP selection and classification by hybrid Association Rule Mining and Artificial Neural Network-based Evolutionary Algorithms

机译:基于混合关联规则挖掘和基于人工神经网络的进化算法对复杂疾病SNP的选择和分类

获取原文
获取原文并翻译 | 示例
           

摘要

Recently, various techniques have been applied to classify Single Nucleotide Polymorphisms (SNP) data as they have been shown to be implicated in various human diseases. One of the major problems related to SNP sets is the large p, small n problem which refers to the high number of features and the small number of samples, which makes the classification task complex. In this paper, a new hybrid intelligent technique based on Association Rule Mining (ARM) and Neural Networks (NN) which uses Evolutionary Algorithms (EA) is proposed to deal with the dimensionality problem. On the one hand, ARM optimized by Grammatical Evolution (GE) is used to select the most informative features and to reduce the dimensionality by parallel extraction of associations between SNPs in two separate datasets of case and control samples. On the other hand, and to complement the previous task, a NN is used for efficient classification. The Genetic Algorithm (GA) is used for setting up the parameters of the two combined techniques. The proposed GA-NN-GEARM approach has been applied on four different SNP datasets obtained from the NCBI Gene Expression Omnibus (GEO) website. The created model has reached a high classification accuracy, reaching in some cases 100%, and has outperformed several feature selection techniques when combined with different classifiers.
机译:最近,由于已表明它们与多种人类疾病有关,因此已应用各种技术对单核苷酸多态性(SNP)数据进行分类。与SNP集有关的主要问题之一是大p,小n问题,这指的是特征数量多而样本数量少,这使分类任务变得复杂。本文提出了一种基于进化算法(EA)的基于关联规则挖掘(ARM)和神经网络(NN)的混合智能技术。一方面,通过语法演变(GE)优化的ARM用于选择最有用的功能,并通过在案例和对照样本的两个独立数据集中并行提取SNP之间的关联来减少维数。另一方面,作为对先前任务的补充,NN用于有效分类。遗传算法(GA)用于设置两种组合技术的参数。拟议的GA-NN-GEARM方法已应用于从NCBI基因表达综合(GEO)网站获得的四个不同的SNP数据集。创建的模型具有很高的分类精度,在某些情况下达到100%,并且在与不同的分类器结合使用时,其性能优于几种特征选择技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号