...
首页> 外文期刊>Network Modeling Analysis in Health Informatics and Bioinformatics >Rule-based analysis for detecting epistasis using associative classification mining
【24h】

Rule-based analysis for detecting epistasis using associative classification mining

机译:使用关联分类挖掘的基于规则的分析以检测上位性

获取原文
获取原文并翻译 | 示例
           

摘要

The advancements in sequencing highthroughput human genome and computational abilities have tremendously improved the understanding of the genetic architecture behind the complex diseases. The development of high-throughput genotyping and nextgeneration sequencing technologies enables large-scale data for genetic epidemiological analysis. These advances led to the identification of a number of single nucleotide polymorphisms (SNPs) associated with complex diseases. The interactions between SNPs responsible for disease susceptibility have been increasingly explored in the current literature. These interaction studies are mathematically challenging and computationally complex. These challenges have been addressed by a number of data mining and machine learning approaches. The goal of this research is to implement associative classification and study its effectiveness for detecting the epistasis in balanced and imbalanced datasets. The proposed approach was evaluated for single-locus models to six-locus models using simulated data. The datasets were generated for five different penetrance functions by varying heritability, minor allele frequency and sample size. In total, 57,300 datasets were generated and several experiments conducted to identify the disease causal SNP interactions. The accuracy of classification by the proposed approach was compared with the existing approaches. The experimental results demonstrated significant improvements in accuracy for detecting interactions associated with the phenotype. Further, the approach was successfully applied over sporadic breast cancer data. The results show interaction among six polymorphisms, which included five different estrogen-metabolism genes.
机译:测序高通量人类基因组和计算能力的进步极大地提高了对复杂疾病背后的遗传结构的理解。高通量基因分型和下一代测序技术的发展为基因流行病学分析提供了大规模数据。这些进展导致鉴定了许多与复杂疾病有关的单核苷酸多态性(SNP)。在当前文献中,越来越多地探索了负责疾病易感性的SNP之间的相互作用。这些相互作用研究在数学上具有挑战性,并且计算复杂。这些挑战已通过多种数据挖掘和机器学习方法来解决。这项研究的目的是实现关联分类并研究其在平衡和不平衡数据集中检测上位性的有效性。使用模拟数据对单场所模型到六场所模型进行了评估。通过改变遗传力,次要等位基因频率和样本量,为五个不同的外显功能生成了数据集。总共生成了57,300个数据集,并进行了几次实验以鉴定疾病引起的SNP相互作用。将提出的方法与现有方法进行分类的准确性进行了比较。实验结果表明,检测与表型相关的相互作用的准确性显着提高。此外,该方法已成功应用于零星乳腺癌数据。结果显示了六个多态性之间的相互作用,其中包括五个不同的雌激素代谢基因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号