...
首页> 外文期刊>BMC Genomics >BacTag - a pipeline for fast and accurate gene and allele typing in bacterial sequencing data based on database preprocessing
【24h】

BacTag - a pipeline for fast and accurate gene and allele typing in bacterial sequencing data based on database preprocessing

机译:Bactag - 一种基于数据库预处理的细菌测序数据的快速和准确基因和等位基因的管道

获取原文
           

摘要

Bacteria carry a wide array of genes, some of which have multiple alleles. These different alleles are often responsible for distinct types of virulence and can determine the classification at the subspecies levels (e.g., housekeeping genes for Multi Locus Sequence Typing, MLST). Therefore, it is important to rapidly detect not only the gene of interest, but also the relevant allele. Current sequencing-based methods are limited to mapping reads to each of the known allele reference, which is a time-consuming procedure. To address this limitation, we developed BacTag - a pipeline that rapidly and accurately detects which genes are present in a sequencing dataset and reports the allele of each of the identified genes. We exploit the fact that different alleles of the same gene have a high similarity. Instead of mapping the reads to each of the allele reference sequences, we preprocess the database prior to the analysis, which makes the subsequent gene and allele identification efficient. During the preprocessing, we determine a representative reference sequence for each gene and store the differences between all alleles and this chosen reference. Throughout the analysis we estimate whether the gene is present in the sequencing data by mapping the reads to this reference sequence; if the gene is found, we compare the variants to those in the preprocessed database. This allows to detect which specific allele is present in the sequencing data. Our pipeline was successfully tested on artificial WGS E. coli, S. pseudintermedius, P. gingivalis, M. bovis, Borrelia spp. and Streptomyces spp. data and real WGS E. coli and K. pneumoniae data in order to report alleles of MLST house-keeping genes. We developed a new pipeline for fast and accurate gene and allele recognition based on database preprocessing and parallel computing and performed better or comparable to the current popular tools. We believe that our approach can be useful for a wide range of projects, including bacterial subspecies classification, clinical diagnostics of bacterial infections, and epidemiological studies.
机译:细菌携带各种基因,其中一些具有多个等位基因。这些不同的等位基因通常对不同类型的毒力负责,并且可以确定亚种水平的分类(例如,用于多基因座序列,MLST的家务基因。因此,不仅迅速检测感兴趣的基因,还非常重要,而且是相关的等位基因。基于流的基于序列的方法限于映射到每个已知等位基因参考的读取,这是耗时的过程。为了解决这些限制,我们开发了Bactag - 一种迅速,准确地检测在测序数据集中存在的基因的管道,并报告每个鉴定的基因的等位基因。我们利用相同基因的不同等位基因具有高相似性的事实。在分析之前,我们在分析之前预处理数据库,而不是将读取映射到每个等位基因参考序列。在预处理期间,我们确定每个基因的代表性参考序列,并存储所有等位基因之间的差异和这种选择的参考。在整个分析中,我们通过将读取映射到该参考序列来估计该基因是否存在于测序数据中;如果找到基因,我们将变体与预处理数据库中的那些进行比较。这允许检测测序数据中存在哪个特定等位基因。我们的管道在人造WGS大肠杆菌,S.Pseudintermedius,P.Gingivalis,M. Bovis,Borrelia SPP上成功测试。和链霉菌spp。数据和真正的WGS大肠杆菌和K.肺炎群数据,以报告MLST House-empling基因的等位基因。我们开发了一种基于数据库预处理和平行计算的快速准确基因和等位基因识别的新管道,并与当前流行的工具进行更好或更好地执行。我们认为,我们的方法可用于各种项目,包括细菌亚种分类,细菌感染的临床诊断和流行病学研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号