首页> 美国卫生研究院文献>PLoS Computational Biology >High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs
【2h】

High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs

机译:使用群体参考图从全基因组测序数据推断高精度HLA类型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30–250 CPU hours per sample) remain a significant challenge to practical application.
机译:人类白细胞抗原(HLA)基因的遗传变异与许多自身免疫和传染性疾病表型相关,是自身和非自身免疫学区别的重要组成部分,并形成免疫表位库。因此,非常需要确定HLA基因的等位基因状态(HLA分型)作为标准全基因组测序数据的副产物,并能够在当前正在进行的人群测序项目中对样品进行免疫遗传学表征。然而,HLA基因之间广泛的超多态性和序列相似性给准确的读作图带来了问题,并使从全基因组测序数据推断HLA类型成为一个具有挑战性的问题。我们描述了如何在“人口参考图”(PRG)框架中应对这些挑战。首先,我们构建了一个包含46个(主要是HLA)基因和假基因,其基因组背景及其特征性序列变异的PRG,并整合了10,000多个已知等位基因序列的数据库。其次,我们提出了一种序列至PRG配对末端的读图算法,该算法可对HLA基因进行准确的读图。第三,我们采用简单的似然框架,从每个位点的IMGT / HLA数据库推断出G组分辨率下最可能的一对潜在等位基因。我们证明了我们的算法HLA * PRG在很大程度上优于现有方法。我们评估了六个经典的I类和II类HLA基因(HLA-A,-B,-C,-DQA1,-DQB1,-DRB1)和一组14个样本(3个样本的2 x 100bp, 11个具有2 x 250bp Illumina HiSeq数据的样本)。在测试的158个等位基因中,我们正确推断了157个等位基因(99.4%)。我们还将在原始验证数据中识别并重新键入两个错误的等位基因。我们得出的结论是,HLA * PRG首次从标准全基因组测序数据中获得了与金标准参考方法相当的准确性,尽管较高的计算需求(当前每个样品约30-250 CPU小时)仍然是实际应用中的重大挑战。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号