...
首页> 外文期刊>Methods: A Companion to Methods in Enzymology >On the design and analysis of next-generation sequencing genotyping for a cohort with haplotype-informative reads
【24h】

On the design and analysis of next-generation sequencing genotyping for a cohort with haplotype-informative reads

机译:具有单倍型信息读取的队列的下一代测序基因分型的设计和分析

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Next-generation sequencing (NGS) technologies, which can provide base-pair resolution genetic information for all types of genetic variations, are increasingly used in genetics research. However, due to the complex nature of NGS technologies and analytics and their relatively high cost, investigators face practical challenges for both design and analysis. These challenges are further complicated by recent methodological developments that make it possible to use haplotype information in sequencing reads. In light of these developments, we conducted comprehensive simulations to evaluate the effects of sequencing coverage, insert size of paired-end reads, and sample size on genotype calling and haplotype phasing in NGS studies. In contrast to previous studies that typically use idealized scenarios to tease out the effects of individual design and analytic decisions, we used a complete analytical pipeline from read mapping and variant detection to genotype calling and haplotype phasing so that we can assess the joint effects of multiple decisions and thus make more realistic recommendations to investigators. Consistent with previous studies, we found that the use of haplotype information in reads can improve the accuracy of genotype calling and haplotype phasing, and we also found that a mixture of short and long insert sizes of paired-end reads may offer even greater accuracy. However, this benefit is only clear in high coverage sequencing where variant detection is close to perfect. Finally, we observed that LD-based refinement methods do not always outperform single site based methods for genotype calling. Therefore, we should choose analytical methods that are appropriate to the sequencing coverage and sample size in order to use haplotype information in sequencing reads. (C) 2015 Elsevier Inc. All rights reserved.
机译:可以为所有类型的遗传变异提供碱基对分辨率的遗传信息的下一代测序(NGS)技术越来越多地用于遗传学研究中。但是,由于NGS技术和分析的复杂性及其相对较高的成本,研究人员在设计和分析方面都面临着实际挑战。这些挑战由于最近的方法学发展而变得更加复杂,这使得在测序读取中使用单倍型信息成为可能。根据这些进展,我们进行了全面的模拟,以评估测序覆盖率,成对末端阅读片段的插入大小以及样本大小对NGS研究中基因型调用和单倍型定相的影响。与以前的研究通常使用理想化的场景来梳理单个设计和分析决策的影响相比,我们使用了完整的分析流程,从读取映射和变体检测到基因型调用和单倍型定相,这样我们就可以评估多种因素的联合影响。做出决定,从而向调查人员提出更现实的建议。与先前的研究一致,我们发现在读取中使用单倍型信息可以提高基因型调用和单倍型定相的准确性,并且我们还发现,配对末端读取的长短插入物大小的混合可能提供更高的准确性。但是,只有在变异检测接近完美的高覆盖测序中,这种优势才很明显。最后,我们观察到基于LD的提纯方法并不总是优于基于单站点的基因型调用方法。因此,我们应该选择适合测序覆盖率和样本量的分析方法,以便在测序读数中使用单倍型信息。 (C)2015 Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号