首页> 外文期刊>BMC Bioinformatics >METAMVGL: a multi-view graph-based metagenomic contig binning algorithm by integrating assembly and paired-end graphs
【24h】

METAMVGL: a multi-view graph-based metagenomic contig binning algorithm by integrating assembly and paired-end graphs

机译:METAMVGL:通过集成组装和配对端图,一种基于多视图的基于图形的METAGENOMIC CONTIG融合算法

获取原文
           

摘要

Due to the complexity of microbial communities, de novo assembly on next generation sequencing data is commonly unable to produce complete microbial genomes. Metagenome assembly binning becomes an essential step that could group the fragmented contigs into clusters to represent microbial genomes based on contigs’ nucleotide compositions and read depths. These features work well on the long contigs, but are not stable for the short ones. Contigs can be linked by sequence overlap (assembly graph) or by the paired-end reads aligned to them (PE graph), where the linked contigs have high chance to be derived from the same clusters. We developed METAMVGL, a multi-view graph-based metagenomic contig binning algorithm by integrating both assembly and PE graphs. It could strikingly rescue the short contigs and correct the binning errors from dead ends. METAMVGL learns the two graphs’ weights automatically and predicts the contig labels in a uniform multi-view label propagation framework. In experiments, we observed METAMVGL made use of significantly more high-confidence edges from the combined graph and linked dead ends to the main graph. It also outperformed many state-of-the-art contig binning algorithms, including MaxBin2, MetaBAT2, MyCC, CONCOCT, SolidBin and GraphBin on the metagenomic sequencing data from simulation, two mock communities and Sharon infant fecal samples. Our findings demonstrate METAMVGL outstandingly improves the short contig binning and outperforms the other existing contig binning tools on the metagenomic sequencing data from simulation, mock communities and infant fecal samples.
机译:由于微生物社区的复杂性,下一代测序数据的De Novo组装通常不能产生完整的微生物基因组。 Metagenome组装箱成为一种基本步骤,可以将片段化的ContiG分组成簇,以基于Contig的核苷酸组合物和读取深度来表示微生物基因组。这些功能很好地工作,但对于短的折叠而言并不稳定。 CONDIG可以通过序列重叠(装配图)或由与它们(PE图)对齐的配对端读取,其中链接的CONDIG具有高机会才能从同一群集派生。我们开发了MetamvGL,通过集成了组装和PE图来开发了一种基于多视图图形的Metagenomic Contig融合算法。它可以惊人地拯救短折叠并纠正死胡同的融合错误。 METAMVGL自动学习两个图表的权重,并在统一的多视图标签传播框架中预测COLIG标签。在实验中,我们观察了Metamvgl利用组合图中的明显高度置信边缘,并将死角连接到主图。它还优于许多最先进的Contig融合算法,包括MaxBin2,Metabat2,MyCC,来自模拟,两种模拟社区和Sharon婴儿粪便样本的MACAGONOMIC测序数据上的MAXBIN2,Metabat2,MyCC,Concoct,Solidbin和GraphBin。我们的研究结果证明了Metamvgl突出地改善了从仿真,模拟社区和婴儿粪便样本中的偏心序列数据上的其他现有Contig融合工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号