首页> 美国卫生研究院文献>Bioinformatics >An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions
【2h】

An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions

机译:一种用于学习疾病风险的最大熵概率模型的算法可以有效地搜索并少量编码多基因座基因组相互作用

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: In both genome-wide association studies (GWAS) and pathway analysis, the modest sample size relative to the number of genetic markers presents formidable computational, statistical and methodological challenges for accurately identifying markers/interactions and for building phenotype-predictive models.>Results: We address these objectives via maximum entropy conditional probability modeling (MECPM), coupled with a novel model structure search. Unlike neural networks and support vector machines (SVMs), MECPM makes explicit and is determined by the interactions that confer phenotype-predictive power. Our method identifies both a marker subset and the multiple k-way interactions between these markers. Additional key aspects are: (i) evaluation of a select subset of up to five-way interactions while retaining relatively low complexity; (ii) flexible single nucleotide polymorphism (SNP) coding (dominant, recessive) within each interaction; (iii) no mathematical interaction form assumed; (iv) model structure and order selection based on the Bayesian Information Criterion, which fairly compares interactions at different orders and automatically sets the experiment-wide significance level; (v) MECPM directly yields a phenotype-predictive model. MECPM was compared with a panel of methods on datasets with up to 1000 SNPs and up to eight embedded penetrance function (i.e. ground-truth) interactions, including a five-way, involving less than 20 SNPs. MECPM achieved improved sensitivity and specificity for detecting both ground-truth markers and interactions, compared with previous methods.>Availability: >Contact: >Supplementary information: are available at Bioinformatics online.
机译:>动机:在全基因组关联研究(GWAS)和途径分析中,相对于遗传标志物数量而言,适中的样本量对准确识别标志物/相互作用以及针对遗传标志的研究提出了巨大的计算,统计和方法挑战。建立表型预测模型。>结果:我们通过最大熵条件概率模型(MECPM)以及新颖的模型结构搜索来解决这些目标。与神经网络和支持向量机(SVM)不同,MECPM明确明确,并由赋予表型预测能力的相互作用决定。我们的方法既可以识别标记子集,又可以识别这些标记之间的多个k方向相互作用。其他关键方面是:(i)在保持相对较低复杂性的同时,评估最多五次交互的选定子集; (ii)每次相互作用内的柔性单核苷酸多态性(SNP)编码(显性,隐性); (iii)不假定数学上的交互形式; (iv)基于贝叶斯信息准则的模型结构和顺序选择,该准则公平地比较了不同顺序的交互,并自动设置了整个实验范围内的显着性水平; (v)MECPM直接产生表型预测模型。将MECPM与具有多达1000个SNP和多达8个嵌入式渗透功能(即地面真相)相互作用(包括5种方式,涉及少于20个SNP)的数据集上的一组方法进行了比较。与以前的方法相比,MECPM在检测地面真相标记和相互作用方面实现了更高的灵敏度和特异性。>可用性: >联系方式: >补充信息:可在生物信息学在线获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号