首页> 外文期刊>BMC Bioinformatics >Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions
【24h】

Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions

机译:使用Panseq进行泛基因组序列分析:在线工具,可快速分析核心和辅助基因组区域

获取原文
           

摘要

Background The pan-genome of a bacterial species consists of a core and an accessory gene pool. The accessory genome is thought to be an important source of genetic variability in bacterial populations and is gained through lateral gene transfer, allowing subpopulations of bacteria to better adapt to specific niches. Low-cost and high-throughput sequencing platforms have created an exponential increase in genome sequence data and an opportunity to study the pan-genomes of many bacterial species. In this study, we describe a new online pan-genome sequence analysis program, Panseq. Results Panseq was used to identify Escherichia coli O157:H7 and E. coli K-12 genomic islands. Within a population of 60 E. coli O157:H7 strains, the existence of 65 accessory genomic regions identified by Panseq analysis was confirmed by PCR. The accessory genome and binary presence/absence data, and core genome and single nucleotide polymorphisms (SNPs) of six L. monocytogenes strains were extracted with Panseq and hierarchically clustered and visualized. The nucleotide core and binary accessory data were also used to construct maximum parsimony (MP) trees, which were compared to the MP tree generated by multi-locus sequence typing (MLST). The topology of the accessory and core trees was identical but differed from the tree produced using seven MLST loci. The Loci Selector module found the most variable and discriminatory combinations of four loci within a 100 loci set among 10 strains in 1 s, compared to the 449 s required to exhaustively search for all possible combinations; it also found the most discriminatory 20 loci from a 96 loci E. coli O157:H7 SNP dataset. Conclusion Panseq determines the core and accessory regions among a collection of genomic sequences based on user-defined parameters. It readily extracts regions unique to a genome or group of genomes, identifies SNPs within shared core genomic regions, constructs files for use in phylogeny programs based on both the presence/absence of accessory regions and SNPs within core regions and produces a graphical overview of the output. Panseq also includes a loci selector that calculates the most variable and discriminatory loci among sets of accessory loci or core gene SNPs. Availability Panseq is freely available online at http://76.70.11.198/panseq . Panseq is written in Perl.
机译:背景技术细菌物种的全基因组由一个核心和一个辅助基因库组成。辅助基因组被认为是细菌种群中遗传变异的重要来源,可通过横向基因转移获得,从而使细菌亚群更好地适应特定的生态位。低成本和高通量的测序平台使基因组序列数据呈指数增长,并为研究许多细菌物种的全基因组提供了机会。在这项研究中,我们描述了一个新的在线泛基因组序列分析程序Panseq。结果Panseq用于鉴定大肠杆菌O157:H7和大肠杆菌K-12基因岛。在60株O157:H7大肠杆菌菌株中,通过Panseq分析鉴定出的65个辅助基因组区域的存在已通过PCR确认。用Panseq提取6个单核细胞增生李斯特氏菌菌株的辅助基因组和二进制存在/不存在数据,以及核心基因组和单核苷酸多态性(SNP),并进行分级聚类和可视化。核苷酸核心和二进制辅助数据也用于构建最大简约(MP)树,将其与通过多位点序列分型(MLST)生成的MP树进行比较。附件树和核心树的拓扑是相同的,但不同于使用七个MLST基因座生成的树。与全面搜索所有可能组合所需的449 s相比,“基因座选择器”模块在1 s内发现了10个菌株中100个基因座集中的四个基因座的变化最大且具有区别性。它还从96个基因座的大肠杆菌O157:H7 SNP数据集中找到了最有区别的20个基因座。结论Panseq根据用户定义的参数确定基因组序列集合中的核心和辅助区域。它可以轻松地提取一个基因组或一组基因组所独有的区域,识别共享核心基因组区域内的SNP,基于核心区域内附件区域和SNP的存在/不存在,构建用于系统发育程序的文件,并生成该区域的图形概述输出。 Panseq还包括一个基因座选择器,该选择器可计算一组辅助基因座或核心基因SNP中可变性最高的基因座。可用性Panseq可从http://76.70.11.198/panseq在线免费获得。 Panseq用Perl编写。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号