...
首页> 外文期刊>Genetics, selection, evolution >A strategy to improve phasing of whole-genome sequenced individuals through integration of familial information from dense genotype panels
【24h】

A strategy to improve phasing of whole-genome sequenced individuals through integration of familial information from dense genotype panels

机译:通过整合来自密集基因型面板的家族信息来改善全基因组测序个体分期的策略

获取原文

摘要

Haplotype reconstruction (phasing) is an essential step in many applications, including imputation and genomic selection. The best phasing methods rely on both familial and linkage disequilibrium (LD) information. With whole-genome sequence (WGS) data, relatively small samples of reference individuals are generally sequenced due to prohibitive sequencing costs, thus only a limited amount of familial information is available. However, reference individuals have many relatives that have been genotyped (at lower density). The goal of our study was to improve phasing of WGS data by integrating familial information from haplotypes that were obtained from a larger genotyped dataset and to quantify its impact on imputation accuracy. Aligning a pre-phased WGS panel [~5 million single nucleotide polymorphisms (SNPs)], which is based on LD information only, to a 50k SNP array that is phased with both LD and familial information (called scaffold) resulted in correctly assigning parental origin for 99.62% of the WGS SNPs, their phase being determined unambiguously based on parental genotypes. Without using the 50k haplotypes as scaffold, that value dropped as expected to 50%. Correctly phased segments were on average longer after alignment to the genotype phase while the number of switches decreased slightly. Most of the incorrectly assigned segments, and subsequent switches, were due to singleton errors. Imputation from 50k SNP array to WGS data with improved phasing had a marginal impact on imputation accuracy (measured as r 2), i.e. on average, 90.47% with traditional techniques versus 90.65% with pre-phasing integrating familial information. Differences were larger for SNPs located in chromosome ends and rare variants. Using a denser WGS panel (~13 millions SNPs) that was obtained with traditional variant filtering rules, we found similar results although performances of both phasing and imputation accuracy were lower. We present a phasing strategy for WGS data, which indirectly integrates familial information by aligning WGS haplotypes that are pre-phased with LD information only on haplotypes obtained with genotyping data, with both LD and familial information and on a much larger population. This strategy results in very few mismatches with the phase obtained by Mendelian segregation rules. Finally, we propose a strategy to further improve phasing accuracy based on haplotype clusters obtained with genotyping data.
机译:单倍型重建(定相)是许多应用(包括插补和基因组选择)中必不可少的步骤。最佳的定相方法取决于家族和连锁不平衡(LD)信息。对于全基因组序列(WGS)数据,由于过高的测序成本,通常对相对较小的参考个体样本进行测序,因此只能获得有限的家族信息。但是,参考人有许多亲戚已经过基因分型(以较低的密度)。我们研究的目的是通过整合来自更大基因型数据集的单体型的家族信息,并量化其对插补准确性的影响,从而改善WGS数据的定相。将仅基于LD信息的预分阶段的WGS面板[〜500万个单核苷酸多态性(SNP)]与同时包含LD和家族信息(称为支架)的50k SNP阵列对齐,可以正确分配亲本99.62%的WGS SNP的起源,其阶段是根据亲本基因型明确确定的。如果不使用50k单倍型作为支架,该值将下降到预期的50%。与基因型相位对齐后,正确定相的片段平均更长,而开关的数量略有减少。大多数错误分配的段以及后续的切换是由于单例错误。从50k SNP阵列到WGS数据的插补以及改进的相位对插补精度(以r 2衡量)有边际影响,即传统技术平均为90.47%,而预相位集成了家族信息则为90.65%。位于染色体末端的SNP和稀有变异的差异更大。使用通过传统变量过滤规则获得的更密集的WGS面板(约1300万个SNP),我们发现了相似的结果,尽管定相和插补精度的性能均较低。我们提出了一种WGS数据的分阶段策略,该方法通过将仅与LD信息预先分期的WGS单体型与仅通过基因分型数据获得的单体型,LD和家族信息以及更大的人群比对来间接整合家族信息。该策略导致与孟德尔分离规则所获得的相位极不匹配。最后,我们提出了一种基于通过基因分型数据获得的单倍型聚类进一步提高定相精度的策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号