...
首页> 外文期刊>BMC Genomics >Feature selection with interactions in logistic regression models using multivariate synergies for a GWAS application
【24h】

Feature selection with interactions in logistic regression models using multivariate synergies for a GWAS application

机译:在GWAS应用中使用多元协同在逻辑回归模型中进行交互的特征选择

获取原文
           

摘要

Genotype-phenotype association has been one of the long-standing problems in bioinformatics. Identifying both the marginal and epistatic effects among genetic markers, such as Single Nucleotide Polymorphisms (SNPs), has been extensively integrated in Genome-Wide Association Studies (GWAS) to help derive “causal” genetic risk factors and their interactions, which play critical roles in life and disease systems. Identifying “synergistic” interactions with respect to the outcome of interest can help accurate phenotypic prediction and understand the underlying mechanism of system behavior. Many statistical measures for estimating synergistic interactions have been proposed in the literature for such a purpose. However, except for empirical performance, there is still no theoretical analysis on the power and limitation of these synergistic interaction measures. In this paper, it is shown that the existing information-theoretic multivariate synergy depends on a small subset of the interaction parameters in the model, sometimes on only one interaction parameter. In addition, an adjusted version of multivariate synergy is proposed as a new measure to estimate the interactive effects, with experiments conducted over both simulated data sets and a real-world GWAS data set to show the effectiveness. We provide rigorous theoretical analysis and empirical evidence on why the information-theoretic multivariate synergy helps with identifying genetic risk factors via synergistic interactions. We further establish the rigorous sample complexity analysis on detecting interactive effects, confirmed by both simulated and real-world data sets.
机译:基因型-表型关联一直是生物信息学中长期存在的问题之一。识别基因标记(例如单核苷酸多态性(SNP))的边际效应和上位效应已广泛整合到基因组广泛关联研究(GWAS)中,以帮助得出“因果”遗传风险因素及其相互作用,这些因素起着至关重要的作用在生命和疾病系统中。识别与感兴趣的结果有关的“协同”相互作用可以帮助准确的表型预测并了解系统行为的潜在机制。为此目的,文献中已经提出了许多用于估计协同相互作用的统计方法。但是,除了经验性能外,还没有关于这些协同相互作用措施的效力和局限性的理论分析。在本文中,表明现有的信息理论多元协同作用取决于模型中交互参数的一小部分,有时仅取决于一个交互参数。此外,还提出了一种调整后的多元协同效应版本,作为估算交互作用的一种新方法,并通过对模拟数据集和实际GWAS数据集进行的实验来证明其有效性。我们提供了严格的理论分析和经验证据,说明了信息理论多元协同为什么可通过协同相互作用帮助确定遗传风险因素。我们进一步建立了用于检测交互作用的严格样本复杂度分析,并通过模拟和现实数据集进行了验证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号