首页> 美国卫生研究院文献>PLoS Computational Biology >A Probabilistic Model to Predict Clinical Phenotypic Traits from Genome Sequencing
【2h】

A Probabilistic Model to Predict Clinical Phenotypic Traits from Genome Sequencing

机译:从基因组测序预测临床表型特征的概率模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Genetic screening is becoming possible on an unprecedented scale. However, its utility remains controversial. Although most variant genotypes cannot be easily interpreted, many individuals nevertheless attempt to interpret their genetic information. Initiatives such as the Personal Genome Project (PGP) and Illumina's Understand Your Genome are sequencing thousands of adults, collecting phenotypic information and developing computational pipelines to identify the most important variant genotypes harbored by each individual. These pipelines consider database and allele frequency annotations and bioinformatics classifications. We propose that the next step will be to integrate these different sources of information to estimate the probability that a given individual has specific phenotypes of clinical interest. To this end, we have designed a Bayesian probabilistic model to predict the probability of dichotomous phenotypes. When applied to a cohort from PGP, predictions of Gilbert syndrome, Graves' disease, non-Hodgkin lymphoma, and various blood groups were accurate, as individuals manifesting the phenotype in question exhibited the highest, or among the highest, predicted probabilities. Thirty-eight PGP phenotypes (26%) were predicted with area-under-the-ROC curve (AUC)>0.7, and 23 (15.8%) of these were statistically significant, based on permutation tests. Moreover, in a Critical Assessment of Genome Interpretation (CAGI) blinded prediction experiment, the models were used to match 77 PGP genomes to phenotypic profiles, generating the most accurate prediction of 16 submissions, according to an independent assessor. Although the models are currently insufficiently accurate for diagnostic utility, we expect their performance to improve with growth of publicly available genomics data and model refinement by domain experts.
机译:基因筛选正以前所未有的规模变得可能。但是,其实用性仍存在争议。尽管大多数变异基因型无法轻易解释,但许多人仍试图解释其遗传信息。个人基因组计划(PGP)和Illumina的“了解您的基因组”等计划正在对成千上万的成年人进行测序,收集表型信息并开发计算流程,以识别每个人所拥有的最重要的变异基因型。这些管道考虑数据库和等位基因频率注释以及生物信息学分类。我们建议下一步将是整合这些不同的信息源,以估计给定个体具有临床关注的特定表型的可能性。为此,我们设计了贝叶斯概率模型来预测二分表型的可能性。当应用于来自PGP的队列时,吉尔伯特综合征,格雷夫斯病,非霍奇金淋巴瘤和各种血型的预测是准确的,因为表现出所述表型的个体表现出最高或最高的预测概率。根据置换测试,预测了38种PGP表型(ROC曲线下面积(AUC)> 0.7),其中23种(15.8%)具有统计学意义。此外,根据独立评估者的说法,在基因组解释的关键评估(CAGI)盲预测实验中,该模型用于将77个PGP基因组与表型匹配,产生了16个提交的最准确的预测。尽管目前这些模型对于诊断实用程序而言不够准确,但是我们希望它们的性能会随着公开可用的基因组数据的增长和领域专家对模型的完善而提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号