【24h】

Parallel Pair-HMM SNP Detection

机译:并行对-HMM SNP检测

获取原文
获取原文并翻译 | 示例

摘要

I. MOTIVATION: Due to the massive amounts of data generated from each instrument run, next generation sequencing technologies have presented researchers with unique analytical challenges which require innovative, computationally efficient statistical solutions. Here we present a parallel implementation of a probabilistic Pair-Hidden Markov Model for base calling and SNP detection in next generation sequencing data. Our approach incorporates multiple sources of error into the base calling procedure which leads to more accurate results. In addition, our approach applies a likelihood ratio test that provides researchers with straight-forward SNP calling cutoffs based on a p-value cutoff or a false discovery control. II. RESULTS: We have developed GNUMAP-SNP, which is a highly accurate approach for the identification of SNPs in next generation sequencing data. By utilizing a novel probabilistic Pair-Hidden Markov Model, GNUMAP-SNP effectively accounts for uncertainty in the read calls as well as read mapping in an unbiased fashion. Our results show that GNUMAP-SNP has both high sensitivity and high specificity throughout the genome, which is especially true in repeat regions or in areas with low read coverage. In addition, we propose a statistical framework that accounts for the background noise using straightforward statistical cutoffs which filters out false-positive results. The parallel implementation of SNP calling achieves near linear speedup on distributed memory or shared memory platforms. III. AVAILABILITY: GNUMAP-SNP is available as a module in the GNUMAP probabilistic read mapping software. GNUMAP is freely available for download at: http://dna.cs.byu.edu/gnumap/.
机译:I.动力:由于每次仪器运行都会产生大量数据,因此下一代测序技术给研究人员带来了独特的分析挑战,需要创新的,计算效率高的统计解决方案。在这里,我们介绍了概率配对隐马尔可夫模型的并行实现,用于下一代测序数据中的碱基检出和SNP检测。我们的方法将多种错误源合并到基本调用过程中,从而导致更准确的结果。此外,我们的方法应用了似然比检验,该检验为研究人员提供了基于p值截止或错误发现控件的直接SNP调用截止。二。结果:我们开发了GNUMAP-SNP,这是一种用于鉴定下一代测序数据中SNP的高精度方法。通过使用一种新型的概率对-隐马尔可夫模型,GNUMAP-SNP有效地解决了读取调用中的不确定性以及以无偏见的方式进行读取映射。我们的结果表明,GNUMAP-SNP在整个基因组中均具有高灵敏度和高特异性,在重复区域或读取覆盖率较低的区域尤其如此。此外,我们提出了一个统计框架,该框架使用简单的统计截止值来消除背景噪音,从而滤除假阳性结果。 SNP调用的并行实现在分布式内存或共享内存平台上实现了近乎线性的加速。三,可用性:GNUMAP-SNP作为GNUMAP概率读取映射软件中的模块提供。 GNUMAP可从以下网址免费下载:http://dna.cs.byu.edu/gnumap/。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号