首页> 外文期刊>Heredity: An International Journal of Genetics >Toward genomic prediction from whole-genome sequence data: Impact of sequencing design on genotype imputation and accuracy of predictions
【24h】

Toward genomic prediction from whole-genome sequence data: Impact of sequencing design on genotype imputation and accuracy of predictions

机译:朝着全基因组序列数据的基因组预测:测序设计对基因型避难和预测准确性的影响

获取原文
获取原文并翻译 | 示例
           

摘要

Genomic prediction from whole-genome sequence data is attractive, as the accuracy of genomic prediction is no longer bounded by extent of linkage disequilibrium between DNA markers and causal mutations affecting the trait, given the causal mutations are in the data set. A cost-effective strategy could be to sequence a small proportion of the population, and impute sequence data to the rest of the reference population. Here, we describe strategies for selecting individuals for sequencing, based on either pedigree relationships or haplotype diversity. Performance of these strategies (number of variants detected and accuracy of imputation) were evaluated in sequence data simulated through a real Belgian Blue cattle pedigree. A strategy (AHAP), which selected a subset of individuals for sequencing that maximized the number of unique haplotypes (from single-nucleotide polymorphism panel data) sequenced gave good performance across a range of variant minor allele frequencies. We then investigated the optimum number of individuals to sequence by fold coverage given a maximum total sequencing effort. At 600 total fold coverage (x 600), the optimum strategy was to sequence 75 individuals at eightfold coverage. Finally, we investigated the accuracy of genomic predictions that could be achieved. The advantage of using imputed sequence data compared with dense SNP array genotypes was highly dependent on the allele frequency spectrum of the causative mutations affecting the trait. When this followed a neutral distribution, the advantage of the imputed sequence data was small; however, when the causal mutations all had low minor allele frequencies, using the sequence data improved the accuracy of genomic prediction by up to 30%.
机译:来自全基因组序列数据的基因组预测是具有吸引力的,因为基因组预测的准确性不再在DNA标记和因果突变之间的连接不平衡程度的程度,鉴于因果突变处于数据集中。经济效益的策略可以是序列少量的人口,并将序列数据施加到其余的参考群体。在这里,我们描述了根据血统关系或单倍型多样性选择用于测序的个体的策略。通过真正的比利时蓝牛谱系模拟的序列数据评估了这些策略的性能(检测到的变体和估算的准确性)。一种策略(AHAP),选择用于测序的个体子集,以最大化独特的单倍型(来自单核苷酸多态性面板数据)测序的数量在一系列变体次等位基因频率上产生了良好的性能。然后,我们通过折叠覆盖覆盖来调查最佳的个体数量,给定最大总排序努力。在600个总折叠覆盖范围内(x 600),最佳策略是在八倍覆盖范围内序列75个体。最后,我们调查了可以实现的基因组预测的准确性。使用抵抗SNP阵列基因型与沉积序列数据进行使用的优点高度依赖于影响特征的致病性突变的等位基因频谱。当这遵循中性分布时,估算的序列数据的优点很小;然而,当因果突变全部具有低次等位基因频率时,使用序列数据将基因组预测的准确性提高至30%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号