...
首页> 外文期刊>Current Bioinformatics >Metagenome assembly validation: Which metagenome contigs are bona fide?
【24h】

Metagenome assembly validation: Which metagenome contigs are bona fide?

机译:宏基因组组装验证:真正的哪些宏基因组重叠群?

获取原文
获取原文并翻译 | 示例
           

摘要

In the metagenomics, long metagenome contigs can either improve metagenome gene prediction or metagenome sequence binning. Moreover, metagenome contigs can also make gene function annotation more accurate because they provide a lot of genome context information. Because of repetitive sequences of either intra-genomes or inter-genomes, metagenome contigs are probably wrongly assembled. Therefore, it is essential to develop a method to validate metagenome contigs. Here, we propose a computational method to validate metagenome contigs. After realigning raw sequencing reads onto one contig, we first compute a contig-ECDF (empirical cumulative probability distribution functions) and its corresponding reference using a computational simulation-based method. Because a reference of the contig-ECDF is changeless given some parameters, we use the distinction between them to check whether or not a contig is bona fide. The less the distinction is, the more likely a contig is bona fide. For wrongly assembled metagenome contigs, using simulated metagenome datasets, our method was shown to have a good capacity to identify them. After applying the method to a real metagenome dataset, which was sequenced from an in vitro-simulated microbial community with known constituted genomes, we showed that our method had a strong ability to identify bona fide contigs, and further demonstrated that small distinctions between contig-ECDFs and their references were significantly correlated with bona fide contigs. A computational method is developed to validate metagenome contigs. For each metagenome contig, our method gives it a score, and the smaller the score is, the more likely a contig is bona fide. After validation using both simulated and real datasets, our method was shown to have good performances.
机译:在宏基因组学中,长的元基因组重叠群可以改善元基因组基因的预测或元基因组序列的分箱。而且,由于元基因组重叠群提供了大量的基因组背景信息,因此它们也可以使基因功能注释更准确。由于基因组内或基因组间的重复序列,元基因组重叠群可能被错误地组装。因此,开发一种验证元基因组重叠群的方法至关重要。在这里,我们提出了一种计算方法来验证重叠基因组。将原始测序读段重新排列到一个重叠群上后,我们首先使用基于计算仿真的方法计算一个重叠群ECDF(经验累积概率分布函数)及其对应的参考。由于contig-ECDF的引用在给定某些参数的情况下是不变的,因此我们使用它们之间的区别来检查contig是否是真正的。区别越小,contig越有可能是善意的。对于错误组装的元基因组重叠群,使用模拟的元基因组数据集,我们的方法被证明具有很好的识别它们的能力。在将该方法应用于真实的元基因组数据集后,该数据集是从具有已知组成基因组的体外模拟微生物群落中测序而来的,我们证明了我们的方法具有很强的识别善意重叠群的能力,并进一步证明了重叠群之间的微小区别。 ECDF及其参考文献与善意重叠群显着相关。开发了一种计算方法来验证元基因组重叠群。对于每个元基因组重叠群,我们的方法都会给它一个分数,并且分数越小,真实重叠群的可能性就越大。在使用模拟和真实数据集进行验证之后,我们的方法被证明具有良好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号