首页> 外文期刊>Genetics, selection, evolution >Efficient genomic prediction based on whole-genome sequence data using split-and-merge Bayesian variable selection
【24h】

Efficient genomic prediction based on whole-genome sequence data using split-and-merge Bayesian variable selection

机译:基于拆分合并贝叶斯变量选择的全基因组序列数据的高效基因组预测

获取原文
           

摘要

Background Use of whole-genome sequence data is expected to increase persistency of genomic prediction across generations and breeds but affects model performance and requires increased computing time. In this study, we investigated whether the split-and-merge Bayesian stochastic search variable selection (BSSVS) model could overcome these issues. BSSVS is performed first on subsets of sequence-based variants and then on a merged dataset containing variants selected in the first step. Results We used a dataset that included 4,154,064 variants after editing and de-regressed proofs for 3415 reference and 2138 validation bulls for somatic cell score, protein yield and interval first to last insemination. In the first step, BSSVS was performed on 106 subsets each containing ~39,189 variants. In the second step, 1060 up to 472,492 variants, selected from the first step, were included to estimate the accuracy of genomic prediction. Accuracies were at best equal to those achieved with the commonly used Bovine 50k-SNP chip, although the number of variants within a few well-known quantitative trait loci regions was considerably enriched. When variant selection and the final genomic prediction were performed on the same data, predictions were biased. Predictions computed as the average of the predictions computed for each subset achieved the highest accuracies, i.e. 0.5?to?1.1?% higher than the accuracies obtained with the 50k-SNP chip, and yielded the least biased predictions. Finally, the accuracy of genomic predictions obtained when all sequence-based variants were included was similar or up to 1.4?% lower compared to that based on the average predictions across the subsets. By applying parallelization, the split-and-merge procedure was completed in 5?days, while the standard analysis including all sequence-based variants took more than three?months. Conclusions The split-and-merge approach splits one large computational task into many much smaller ones, which allows the use of parallel processing and thus efficient genomic prediction based on whole-genome sequence data. The split-and-merge approach did not improve prediction accuracy, probably because we used data on a single breed for which relationships between individuals were high. Nevertheless, the split-and-merge approach may have potential for applications on data from multiple breeds.
机译:背景技术全基因组序列数据的使用有望提高世代和品种间基因组预测的持久性,但会影响模型性能,并需要增加计算时间。在这项研究中,我们调查了拆分合并贝叶斯随机搜索变量选择(BSSVS)模型是否可以克服这些问题。首先对基于序列的变体的子集执行BSSVS,然后对包含第一步中选择的变体的合并数据集执行BSSVS。结果我们使用了一个数据集,该数据集包括编辑后的4,154,064个变体以及对3415个参考的证据进行了回归分析,并对2138个验证牛进行了体细胞评分,蛋白质产量和首次授精间隔。第一步,对106个子集执行BSSVS,每个子集包含〜39,189个变体。在第二步中,包括了从第一步中选择的1060个多达472,492个变体,以估计基因组预测的准确性。尽管在一些众所周知的定量性状基因座区域内的变体数量大大丰富,但其准确性最高只能与常用的牛50k-SNP芯片所达到的精度相等。当对同一数据进行变体选择和最终基因组预测时,预测会产生偏差。作为对每个子集计算的预测值的平均值计算出的预测值具有最高的精度,即比使用50k-SNP芯片获得的精度高0.5%至1.1.1%,并且产生的偏差最少。最后,与所有子集的平均预测相比,包括所有基于序列的变体时获得的基因组预测的准确性相近或降低了1.4%。通过应用并行化,拆分合并过程在5天之内完成,而包括所有基于序列的变体在内的标准分析花费了3个多月的时间。结论拆分合并方法将一个大型计算任务拆分为许多小得多的任务,从而允许使用并行处理,从而基于全基因组序列数据进行有效的基因组预测。拆分合并方法不能提高预测准确性,这可能是因为我们使用的是个体之间具有较高关系的单个品种的数据。但是,拆分合并方法可能有潜力应用于多个品种的数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号