首页> 美国卫生研究院文献>Bioinformatics >Using 2k + 2 bubble searches to find single nucleotide polymorphisms in k-mer graphs
【2h】

Using 2k + 2 bubble searches to find single nucleotide polymorphisms in k-mer graphs

机译:使用2k + 2气泡搜索在k-mer图中查找单核苷酸多态性

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: Single nucleotide polymorphism (SNP) discovery is an important preliminary for understanding genetic variation. With current sequencing methods, we can sample genomes comprehensively. SNPs are found by aligning sequence reads against longer assembled references. De Bruijn graphs are efficient data structures that can deal with the vast amount of data from modern technologies. Recent work has shown that the topology of these graphs captures enough information to allow the detection and characterization of genetic variants, offering an alternative to alignment-based methods. Such methods rely on depth-first walks of the graph to identify closing bifurcations. These methods are conservative or generate many false-positive results, particularly when traversing highly inter-connected (complex) regions of the graph or in regions of very high coverage.>Results: We devised an algorithm that calls SNPs in converted De Bruijn graphs by enumerating 2k + 2 cycles. We evaluated the accuracy of predicted SNPs by comparison with SNP lists from alignment-based methods. We tested accuracy of the SNP calling using sequence data from 16 ecotypes of Arabidopsis thaliana and found that accuracy was high. We found that SNP calling was even across the genome and genomic feature types. Using sequence-based attributes of the graph to train a decision tree allowed us to increase accuracy of SNP calls further. Together these results indicate that our algorithm is capable of finding SNPs accurately in complex sub-graphs and potentially comprehensively from whole genome graphs.>Availability and implementation: The source code for a C++ implementation of our algorithm is available under the GNU Public Licence v3 at: . The datasets used in this study are available at the European Nucleotide Archive, reference , >Contact: >Supplementary information: are available at Bioinformatics online.
机译:>动机:单核苷酸多态性(SNP)发现是了解遗传变异的重要基础。使用当前的测序方法,我们可以全面采样基因组。通过将序列读数与更长的组装参考序列比对可以发现SNP。 De Bruijn图是有效的数据结构,可以处理来自现代技术的大量数据。最近的工作表明,这些图的拓扑结构捕获了足够的信息,可以检测和表征遗传变异,为基于比对的方法提供了另一种选择。这样的方法依赖于图的深度优先遍历来识别闭合分支。这些方法是保守的,或会产生许多假阳性结果,尤其是在遍历图形的高度互连(复杂)区域或覆盖率非常高的区域时。>结果:我们设计了一种算法,该算法称为SNP在枚举的De Bruijn图中,通过枚举2k De + 2个周期。通过与基于比对方法的SNP列表进行比较,我们评估了预测SNP的准确性。我们使用来自16种拟南芥生态型的序列数据测试了SNP调用的准确性,发现准确性很高。我们发现SNP调用甚至跨越基因组和基因组特征类型。使用图的基于序列的属性来训练决策树,使我们可以进一步提高SNP调用的准确性。这些结果共同表明,我们的算法能够在复杂的子图中准确地找到SNP,并且有可能从整个基因组图中全面地找到SNP。>可用性和实现:该算法的C ++实现的源代码位于GNU Public License v3,位于:。本研究中使用的数据集可从欧洲核苷酸档案库中获得,参考文献>联系方式: >补充信息:可从在线生物信息学中获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号