首页> 外文期刊>PLoS Computational Biology >FamSeq: A Variant Calling Program for Family-Based Sequencing Data Using Graphics Processing Units
【24h】

FamSeq: A Variant Calling Program for Family-Based Sequencing Data Using Graphics Processing Units

机译:FamSeq:使用图形处理单元的基于族的测序数据的变体调用程序

获取原文
           

摘要

Various algorithms have been developed for variant calling using next-generation sequencing data, and various methods have been applied to reduce the associated false positive and false negative rates. Few variant calling programs, however, utilize the pedigree information when the family-based sequencing data are available. Here, we present a program, FamSeq, which reduces both false positive and false negative rates by incorporating the pedigree information from the Mendelian genetic model into variant calling. To accommodate variations in data complexity, FamSeq consists of four distinct implementations of the Mendelian genetic model: the Bayesian network algorithm, a graphics processing unit version of the Bayesian network algorithm, the Elston-Stewart algorithm and the Markov chain Monte Carlo algorithm. To make the software efficient and applicable to large families, we parallelized the Bayesian network algorithm that copes with pedigrees with inbreeding loops without losing calculation precision on an NVIDIA graphics processing unit. In order to compare the difference in the four methods, we applied FamSeq to pedigree sequencing data with family sizes that varied from 7 to 12. When there is no inbreeding loop in the pedigree, the Elston-Stewart algorithm gives analytical results in a short time. If there are inbreeding loops in the pedigree, we recommend the Bayesian network method, which provides exact answers. To improve the computing speed of the Bayesian network method, we parallelized the computation on a graphics processing unit. This allowed the Bayesian network method to process the whole genome sequencing data of a family of 12 individuals within two days, which was a 10-fold time reduction compared to the time required for this computation on a central processing unit.
机译:已经开发了用于使用下一代测序数据进行变体调用的各种算法,并且已经应用​​了各种方法来减少相关的误报率和误报率。但是,当基于家族的测序数据可用时,很少有变体调用程序利用谱系信息。在这里,我们提出了一个程序FamSeq,该程序通过将孟德尔遗传模型中的血统信息纳入变异调用中来降低假阳性和假阴性率。为了适应数据复杂性的变化,FamSeq由孟德尔遗传模型的四个不同实现组成:贝叶斯网络算法,贝叶斯网络算法的图形处理单元版本,Elston-Stewart算法和马尔可夫链蒙特卡洛算法。为了使该软件高效且适用于大家族,我们并行化了贝叶斯网络算法,该算法可处理具有近交循环的系谱,而不会损失NVIDIA图形处理单元的计算精度。为了比较这四种方法的差异,我们将FamSeq应用于族谱大小为7至12的谱系测序数据。当谱系中没有近交环时,Elston-Stewart算法可在短时间内给出分析结果。如果谱系中存在近交循环,我们建议使用贝叶斯网络方法,该方法可提供确切答案。为了提高贝叶斯网络方法的计算速度,我们在图形处理单元上并行化了计算。这使得贝叶斯网络方法可以在两天内处理一个12个人的家庭的全基因组测序数据,与在中央处理器上进行此计算所需的时间相比,该时间减少了10倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号