首页> 外文期刊>GigaScience >PIRATE: A fast and scalable pangenomics toolbox for clustering diverged orthologues in bacteria
【24h】

PIRATE: A fast and scalable pangenomics toolbox for clustering diverged orthologues in bacteria

机译:海盗:一个快速且可扩展的泛基因组学工具箱,用于对细菌中不同的直向同源物进行聚类

获取原文
           

摘要

Background Cataloguing the distribution of genes within natural bacterial populations is essential for understanding evolutionary processes and the genetic basis of adaptation. Advances in whole genome sequencing technologies have led to a vast expansion in the amount of bacterial genomes deposited in public databases. There is a pressing need for software solutions which are able to cluster, catalogue and characterise genes, or other features, in increasingly large genomic datasets. Results Here we present a pangenomics toolbox, PIRATE (Pangenome Iterative Refinement and Threshold Evaluation), which identifies and classifies orthologous gene families in bacterial pangenomes over a wide range of sequence similarity thresholds. PIRATE builds upon recent scalable software developments to allow for the rapid interrogation of thousands of isolates. PIRATE clusters genes (or other annotated features) over a wide range of amino acid or nucleotide identity thresholds and uses the clustering information to rapidly identify paralogous gene families and putative fission/fusion events. Furthermore, PIRATE orders the pangenome using a directed graph, provides a measure of allelic variation, and estimates sequence divergence for each gene family. Conclusions We demonstrate that PIRATE scales linearly with both number of samples and computation resources, allowing for analysis of large genomic datasets, and compares favorably to other popular tools. PIRATE provides a robust framework for analysing bacterial pangenomes, from largely clonal to panmictic species.
机译:背景对自然细菌种群中的基因分布进行分类编目对于理解进化过程和适应的遗传基础至关重要。全基因组测序技术的进步已导致公共数据库中细菌基因组数量的大幅增加。迫切需要能够在日益庞大的基因组数据集中对基因或其他特征进行聚类,分类和表征的软件解决方案。结果在这里,我们介绍了一个Pangenomics工具箱PIRATE(Panenome迭代细化和阈值评估),该工具箱可在广泛的序列相似性阈值范围内对细菌基因组中的直系同源基因家族进行识别和分类。 PIRATE建立在最新可扩展软件开发的基础上,可以快速查询成千上万个分离株。 PIRATE在广泛的氨基酸或核苷酸同一性阈值上对基因(或其他带注释的特征)进行聚类,并使用聚类信息快速识别旁系基因家族和假定的裂变/融合事件。此外,PIRATE使用有向图对全景基因组进行排序,提供等位基因变异的度量,并估计每个基因家族的序列差异。结论我们证明PIRATE与样本数量和计算资源成线性比例关系,从而可以分析大型基因组数据集,并且与其他流行工具相比具有优势。 PIRATE提供了一个强大的框架,可用于分析细菌全基因组,从很大程度上是克隆性物种到panmictic物种。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号