首页> 外文会议>IEEE International Conference on Bioinformatics and Bioengineering >An Associative Classification Based Approach for Detecting SNP-SNP Interactions in High Dimensional Genome
【24h】

An Associative Classification Based Approach for Detecting SNP-SNP Interactions in High Dimensional Genome

机译:基于联想分类的方法检测高维基因组中的SNP-SNP相互作用

获取原文

摘要

There have been many studies that depict genotype-phenotype relationships by identifying genetic variants associated with a specific disease. Researchers focus more attention on interactions between SNPs that are strongly associated with disease in the absence of main effect. In this context, a number of machine learning and data mining tools are applied to identify the combinations of multi-locus SNPs in higher order data. However, none of the current models can identify useful SNP-SNP interactions for high dimensional genome data. Detecting these interactions is challenging due to bio-molecular complexities and computational limitations. The goal of this research was to implement associative classification and study its effectiveness for detecting the epistasis in balanced and imbalanced datasets. The proposed approach was evaluated for two locus epistasis interactions using simulated data. The datasets were generated for 5 different penetrance functions by varying heritability, minor allele frequency and sample size. In total, 23,400 datasets were generated and several experiments are conducted to identify the disease causal SNP interactions. The accuracy of classification by the proposed approach was compared with the previous approaches. Though associative classification showed only relatively small improvement in accuracy for balanced datasets, it outperformed existing approaches in higher order multi-locus interactions in imbalanced datasets.
机译:已经有许多研究可以通过鉴定与特定疾病相关的遗传变异性来描述基因型表型关系。研究人员更多地关注在没有主要效果的情况下与疾病强烈相关的SNP之间的相互作用。在此上下文中,应用许多机器学习和数据挖掘工具以在更高阶数据中识别多基因座SNP的组合。但是,目前的模型都不能识别高维基因组数据的有用的SNP-SNP交互。由于生物分子复杂性和计算限制,检测这些相互作用是挑战。该研究的目标是实施联想分类,并研究其在平衡和不平衡数据集中检测超观的有效性。使用模拟数据评估了两个基因座简历交互的所提出的方法。通过不同的可遗传性,轻微的等位基因频率和样本大小产生5个不同的PENETRANCE函数的数据集。总共产生了23,400个数据集,进行了几个实验以鉴定疾病因果SNP相互作用。将所提出的方法分类的准确性与先前的方法进行比较。虽然关联分类仅对平衡数据集的准确性进行了相对较小的改进,但它在不平衡数据集中的高阶多基因座交互中表现出现有的现有方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号