首页> 美国卫生研究院文献>other >Using topic modeling via non-negative matrix factorization to identify relationships between genetic variants and disease phenotypes: A case study of Lipoprotein(a) (LPA)
【2h】

Using topic modeling via non-negative matrix factorization to identify relationships between genetic variants and disease phenotypes: A case study of Lipoprotein(a) (LPA)

机译:通过非负矩阵分解使用主题建模来识别遗传变异与疾病表型之间的关系:脂蛋白(a)(LPA)的案例研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Genome-wide and phenome-wide association studies are commonly used to identify important relationships between genetic variants and phenotypes. Most studies have treated diseases as independent variables and suffered from the burden of multiple adjustment due to the large number of genetic variants and disease phenotypes. In this study, we used topic modeling via non-negative matrix factorization (NMF) for identifying associations between disease phenotypes and genetic variants. Topic modeling is an unsupervised machine learning approach that can be used to learn patterns from electronic health record data. We chose the single nucleotide polymorphism (SNP) rs10455872 in LPA as the predictor since it has been shown to be associated with increased risk of hyperlipidemia and cardiovascular diseases (CVD). Using data of 12,759 individuals with electronic health records (EHR) and linked DNA samples at Vanderbilt University Medical Center, we trained a topic model using NMF from 1,853 distinct phenotypes and identified six topics. We tested their associations with rs10455872 in LPA. Topics enriched for CVD and hyperlipidemia had positive correlations with rs10455872 (P < 0.001), replicating a previous finding. We also identified a negative correlation between LPA and a topic enriched for lung cancer (P < 0.001) which was not previously identified via phenome-wide scanning. We were able to replicate the top finding in a separate dataset. Our results demonstrate the applicability of topic modeling in exploring the relationship between genetic variants and clinical diseases.
机译:全基因组和表型范围的关联研究通常用于确定遗传变异和表型之间的重要关系。大多数研究将疾病视为独立变量,并且由于存在大量的遗传变异和疾病表型而遭受多重调整的负担。在这项研究中,我们使用通过非负矩阵分解(NMF)进行主题建模来识别疾病表型和遗传变异之间的关联。主题建模是一种无监督的机器学习方法,可用于从电子健康记录数据中学习模式。我们选择了LPA中的单核苷酸多态性(SNP)rs10455872作为预测因子,因为它已显示与高脂血症和心血管疾病(CVD)的风险增加有关。我们利用范德比尔特大学医学中心的12759名具有电子健康记录(EHR)和相关DNA样本的个体的数据,使用来自1853种不同表型的NMF训练了主题模型,并确定了六个主题。我们在LPA中用rs10455872测试了它们的关联。富含CVD和高脂血症的主题与rs10455872正相关(P <0.001),重复了以前的发现。我们还确定了LPA与一个富含肺癌的主题(P <0.001)之间的负相关性,而以前尚未通过全基因组扫描确定该主题。我们能够在单独的数据集中复制最重要的发现。我们的结果证明了主题建模在探索遗传变异与临床疾病之间关系方面的适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号