首页> 美国卫生研究院文献>Heredity >Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions
【2h】

Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions

机译:从全基因组序列数据中进行基因组预测:测序设计对基因型推算和预测准确性的影响

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Genomic prediction from whole-genome sequence data is attractive, as the accuracy of genomic prediction is no longer bounded by extent of linkage disequilibrium between DNA markers and causal mutations affecting the trait, given the causal mutations are in the data set. A cost-effective strategy could be to sequence a small proportion of the population, and impute sequence data to the rest of the reference population. Here, we describe strategies for selecting individuals for sequencing, based on either pedigree relationships or haplotype diversity. Performance of these strategies (number of variants detected and accuracy of imputation) were evaluated in sequence data simulated through a real Belgian Blue cattle pedigree. A strategy (AHAP), which selected a subset of individuals for sequencing that maximized the number of unique haplotypes (from single-nucleotide polymorphism panel data) sequenced gave good performance across a range of variant minor allele frequencies. We then investigated the optimum number of individuals to sequence by fold coverage given a maximum total sequencing effort. At 600 total fold coverage (x 600), the optimum strategy was to sequence 75 individuals at eightfold coverage. Finally, we investigated the accuracy of genomic predictions that could be achieved. The advantage of using imputed sequence data compared with dense SNP array genotypes was highly dependent on the allele frequency spectrum of the causative mutations affecting the trait. When this followed a neutral distribution, the advantage of the imputed sequence data was small; however, when the causal mutations all had low minor allele frequencies, using the sequence data improved the accuracy of genomic prediction by up to 30%.
机译:全基因组序列数据的基因组预测具有吸引力,因为基因组预测的准确性已不再受DNA标记与影响性状的因果突变之间连锁不平衡的程度所限制,前提是因果突变位于数据集中。一种经济有效的策略可以是对一小部分人群进行测序,然后将测序数据推算到其他参考人群中。在这里,我们描述了基于谱系关系或单倍型多样性选择个体进行测序的策略。通过真实的比利时蓝牛谱系对序列数据进行仿真,评估了这些策略的性能(检测到的变体数量和估算的准确性)。一种策略(AHAP)选择了一个个体子集进行测序,该个体子集可以使唯一的单倍型数量最大化(来自单核苷酸多态性面板数据),从而在一系列变异的次要等位基因频率上表现良好。然后,我们在给出最大总测序工作的情况下,研究了通过折叠覆盖率进行测序的最佳个体数量。在600倍总覆盖率(x 600)下,最佳策略是对75位个体进行8倍测序。最后,我们研究了可以实现的基因组预测的准确性。与密集的SNP阵列基因型相比,使用推算序列数据的优势高度取决于影响该性状的致病突变的等位基因频谱。当遵循中性分布时,估算序列数据的优势很小;但是,当所有因果突变均具有较低的次要等位基因频率时,使用序列数据可将基因组预测的准确性提高多达30%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号