...
首页> 外文期刊>Nature >Disease variant prediction with deep generative models of evolutionary data
【24h】

Disease variant prediction with deep generative models of evolutionary data

机译:具有进化数据的深生成模型的疾病变体预测

获取原文
获取原文并翻译 | 示例
           

摘要

Quantifying the pathogenicity of protein variants in human disease-related genes would have a marked effect on clinical decisions, yet the overwhelming majority (over 98%) of these variants still have unknown consequences(1-3). In principle, computational methods could support the large-scale interpretation of genetic variants. However, state-of-the-art methods(4-10) have relied on training machine learning models on known disease labels. As these labels are sparse, biased and of variable quality, the resulting models have been considered insufficiently reliable(11). Here we propose an approach that leverages deep generative models to predict variant pathogenicity without relying on labels. By modelling the distribution of sequence variation across organisms, we implicitly capture constraints on the protein sequences that maintain fitness. Our model EVE (evolutionary model of variant effect) not only outperforms computational approaches that rely on labelled data but also performs on par with, if not better than, predictions from high-throughput experiments, which are increasingly used as evidence for variant classification(12-16). We predict the pathogenicity of more than 36 million variants across 3,219 disease genes and provide evidence for the classification of more than 256,000 variants of unknown significance. Our work suggests that models of evolutionary information can provide valuable independent evidence for variant interpretation that will be widely useful in research and clinical settings.
机译:量化蛋白质变异在人类疾病相关基因中的致病性对临床决策具有显着影响,但这些变体的绝大多数(超过98%)仍然存在未知的后果(1-3)。原则上,计算方法可以支持遗传变异的大规模解释。然而,最先进的方法(4-10)依赖于已知疾病标签上的培训机器学习模型。由于这些标签稀疏,偏置和可变质量,所产生的模型被认为不够可靠(11)。在这里,我们提出了一种利用深层生成模型来预测变异致病性而不依赖于标签的方法。通过对生物体的序列变异分布进行建模,我们隐含捕获维持健身的蛋白质序列的约束。我们的模型EVE(变异效果的进化模型)不仅优于依赖标记数据的计算方法,而且如果不优于高通量实验的预测,则越来越多地用作变体分类的证据(12 -16)。我们预测3,219个疾病基因的3600多万变种的致病性,并提供了超过256,000个不明显的癌症的分类。我们的工作表明,进化信息的模型可以为变体解释提供有价值的独立证据,这将在研究和临床环境中广泛有用。

著录项

  • 来源
    《Nature》 |2021年第7883期|91-95|共5页
  • 作者单位

    Harvard Med Sch Dept Syst Biol Marks Grp Boston MA 02115 USA;

    Univ Oxford Dept Comp Sci OATML Grp Oxford England;

    Harvard Med Sch Dept Syst Biol Marks Grp Boston MA 02115 USA;

    Univ Oxford Dept Comp Sci OATML Grp Oxford England;

    Harvard Med Sch Dept Syst Biol Marks Grp Boston MA 02115 USA;

    Harvard Med Sch Dept Syst Biol Marks Grp Boston MA 02115 USA;

    Univ Oxford Dept Comp Sci OATML Grp Oxford England;

    Harvard Med Sch Dept Syst Biol Marks Grp Boston MA 02115 USA|Broad Inst Harvard & MIT Cambridge MA 02142 USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);美国《生物学医学文摘》(MEDLINE);美国《化学文摘》(CA);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号