首页> 外文期刊>BMC Genomics >Comparison of different assembly and annotation tools on analysis of simulated viral metagenomic communities in the gut
【24h】

Comparison of different assembly and annotation tools on analysis of simulated viral metagenomic communities in the gut

机译:肠道中模拟病毒宏基因组群落分析中不同组装和注释工具的比较

获取原文
           

摘要

Background The main limitations in the analysis of viral metagenomes are perhaps the high genetic variability and the lack of information in extant databases. To address these issues, several bioinformatic tools have been specifically designed or adapted for metagenomics by improving read assembly and creating more sensitive methods for homology detection. This study compares the performance of different available assemblers and taxonomic annotation software using simulated viral-metagenomic data. Results We simulated two 454 viral metagenomes using genomes from NCBI's RefSeq database based on the list of actual viruses found in previously published metagenomes. Three different assembly strategies, spanning six assemblers, were tested for performance: overlap-layout-consensus algorithms Newbler, Celera and Minimo; de Bruijn graphs algorithms Velvet and MetaVelvet; and read probabilistic model Genovo. The performance of the assemblies was measured by the length of resulting contigs (using N50), the percentage of reads assembled and the overall accuracy when comparing against corresponding reference genomes. Additionally, the number of chimeras per contig and the lowest common ancestor were estimated in order to assess the effect of assembling on taxonomic and functional annotation. The functional classification of the reads was evaluated by counting the reads that correctly matched the functional data previously reported for the original genomes and calculating the number of over-represented functional categories in chimeric contigs. The sensitivity and specificity of tBLASTx, PhymmBL and the k-mer frequencies were measured by accurate predictions when comparing simulated reads against the NCBI Virus genomes RefSeq database. Conclusions Assembling improves functional annotation by increasing accurate assignations and decreasing ambiguous hits between viruses and bacteria. However, the success is limited by the chimeric contigs occurring at all taxonomic levels. The assembler and its parameters should be selected based on the focus of each study. Minimo's non-chimeric contigs and Genovo's long contigs excelled in taxonomy assignation and functional annotation, respectively. tBLASTx stood out as the best approach for taxonomic annotation for virus identification. PhymmBL proved useful in datasets in which no related sequences are present as it uses genomic features that may help identify distant taxa. The k-frequencies underperformed in all viral datasets.
机译:背景技术病毒基因组分析的主要局限性可能是遗传变异性高以及现存数据库缺乏信息。为了解决这些问题,通过改进读取装配并创建用于同源性检测的更灵敏的方法,专门针对宏基因组学设计或修改了几种生物信息学工具。这项研究使用模拟的病毒-基因组学数据比较了各种可用的汇编器和分类注释软件的性能。结果我们使用了NCBI RefSeq数据库中的基因组,根据以前发布的元基因组中发现的实际病毒列表,模拟了两个454个病毒元基因组。测试了跨越六个组装程序的三种不同组装策略的性能:重叠布局共识算法Newbler,Celera和Minimo; de Bruijn绘制了算法Velvet和MetaVelvet;并阅读概率模型Genovo。组装的性能通过与相应参考基因组比较的所得重叠群的长度(使用N50),组装的读数的百分比以及总体准确性来衡量。另外,估计每个重叠群的嵌合体数目和最低的祖先,以评估组装对分类学和功能注释的影响。通过计数与先前报道的原始基因组的功能数据正确匹配的读数,并计算嵌合重叠群中过度代表的功能类别的数量,来评估这些读数的功能分类。当将模拟读数与NCBI病毒基因组RefSeq数据库进行比较时,通过准确的预测来测量tBLASTx,PhymmBL和k-mer频率的敏感性和特异性。结论组装可以通过增加准确的分配并减少病毒和细菌之间的歧义匹配来改善功能注释。但是,成功受到所有分类学水平上发生的嵌合重叠群的限制。应当根据每个研究的重点来选择汇编器及其参数。 Minimo的非嵌合重叠群和Genovo的长重叠群分别在分类分配和功能注释方面表现出色。 tBLASTx是用于病毒识别的分类注释的最佳方法。 PhymmBL在没有相关序列的数据集中被证明是有用的,因为它使用的基因组特征可以帮助识别远处的类群。在所有病毒数据集中,k频率均表现不佳。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号