首页> 外文期刊>Nucleic acids research >TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data
【24h】

TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data

机译:TranSurVeyor:一种改进的无数据库算法,用于在高通量测序数据中查找非参考转座

获取原文
           

摘要

Transpositions transfer DNA segments between different loci within a genome; in particular, when a transposition is found in a sample but not in a reference genome, it is called a non-reference transposition. They are important structural variations that have clinical impact. Transpositions can be called by analyzing second generation high-throughput sequencing datasets. Current methods follow either a database-based or a database-free approach. Database-based methods require a database of transposable elements. Some of them have good specificity; however this approach cannot detect novel transpositions, and it requires a good database of transposable elements, which is not yet available for many species. Database-free methods perform de novo calling of transpositions, but their accuracy is low. We observe that this is due to the misalignment of the reads; since reads are short and the human genome has many repeats, false alignments create false positive predictions while missing alignments reduce the true positive rate. This paper proposes new techniques to improve database-free non-reference transposition calling: first, we propose a realignment strategy called one-end remapping that corrects the alignments of reads in interspersed repeats; second, we propose a SNV-aware filter that removes some incorrectly aligned reads. By combining these two techniques and other techniques like clustering and positive-to-negative ratio filter, our proposed transposition caller TranSurVeyor shows at least 3.1-fold improvement in terms of F1-score over existing database-free methods. More importantly, even though TranSurVeyor does not use databases of prior information, its performance is at least as good as existing database-based methods such as MELT, Mobster and Retroseq. We also illustrate that TranSurVeyor can discover transpositions that are not known in the current database.
机译:转座可在基因组内不同基因座之间转移DNA片段;特别地,当在样品中发现转座但在参考基因组中未找到转座时,其称为非参考转座。它们是具有临床影响的重要结构变异。可以通过分析第二代高通量测序数据集来调用转座。当前的方法遵循基于数据库的方法或不基于数据库的方法。基于数据库的方法需要可转座元素的数据库。其中一些具有很好的特异性。然而,这种方法不能检测出新的转座,它需要一个良好的可转座因子数据库,许多物种目前尚无此数据库。无数据库方法执行换位的从头调用,但是其准确性较低。我们观察到这是由于读取的未对齐;由于读数短且人类基因组有许多重复,因此错误的比对会产生错误的阳性预测,而丢失的比对则会降低真实的阳性率。本文提出了新的技术来改进无数据库的非引用换位调用:首先,我们提出了一种称为单端重映射的重排策略,该策略可以纠正散布重复中读取序列的对齐;其次,我们提出了一个SNV感知过滤器,该过滤器可删除一些未正确对齐的读取。通过将这两种技术以及诸如聚类和正负比率过滤器之类的其他技术相结合,我们提出的换位调用程序TranSurVeyor在F1得分方面比现有的无数据库方法提高了至少3.1倍。更重要的是,即使TranSurVeyor不使用先验信息数据库,其性能也至少与现有的基于数据库的方法(如MELT,Mobster和Retroseq)一样好。我们还说明了TranSurVeyor可以发现当前数据库中未知的换位。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号