首页> 外文期刊>mSystems >Performance of Microbiome Sequence Inference Methods in Environments with Varying Biomass
【24h】

Performance of Microbiome Sequence Inference Methods in Environments with Varying Biomass

机译:不同生物量环境下微生物组序列推断方法的性能

获取原文
           

摘要

Microbiome community composition plays an important role in human health, and while most research to date has focused on high-microbial-biomass communities, low-biomass communities are also important. However, contamination and technical noise make determining the true community signal difficult when biomass levels are low, and the influence of varying biomass on sequence processing methods has received little attention. Here, we benchmarked six methods that infer community composition from 16S rRNA sequence reads, using samples of varying biomass. We included two operational taxonomic unit (OTU) clustering algorithms, one entropy-based method, and three more-recent amplicon sequence variant (ASV) methods. We first compared inference results from high-biomass mock communities to assess baseline performance. We then benchmarked the methods on a dilution series made from a single mock community—samples that varied only in biomass. ASVs/OTUs inferred by each method were classified as representing expected community, technical noise, or contamination. With the high-biomass data, we found that the ASV methods had good sensitivity and precision, whereas the other methods suffered in one area or in both. Inferred contamination was present only in small proportions. With the dilution series, contamination represented an increasing proportion of the data from the inferred communities, regardless of the inference method used. However, correlation between inferred contaminants and sample biomass was strongest for the ASV methods and weakest for the OTU methods. Thus, no inference method on its own can distinguish true community sequences from contaminant sequences, but ASV methods provide the most accurate characterization of community and contaminants. IMPORTANCE Microbial communities have important ramifications for human health, but determining their impact requires accurate characterization. Current technology makes microbiome sequence data more accessible than ever. However, popular software methods for analyzing these data are based on algorithms developed alongside older sequencing technology and smaller data sets and thus may not be adequate for modern, high-throughput data sets. Additionally, samples from environments where microbes are scarce present additional challenges to community characterization relative to high-biomass environments, an issue that is often ignored. We found that a new class of microbiome sequence processing tools, called amplicon sequence variant (ASV) methods, outperformed conventional methods. In samples representing low-biomass communities, where sample contamination becomes a significant confounding factor, the improved accuracy of ASV methods may allow more-robust computational identification of contaminants.
机译:微生物组群落组成在人类健康中起着重要作用,尽管迄今为止,大多数研究都集中在高微生物生物量群落,但低生物量群落也很重要。然而,当生物量水平低时,污染和技术噪声使确定真正的群落信号变得困难,并且生物量的变化对序列处理方法的影响几乎没有引起注意。在这里,我们对六种方法进行了基准测试,这些方法使用不同生物量的样品,从16S rRNA序列读数推断出群落组成。我们包括两种操作分类单位(OTU)聚类算法,一种基于熵的方法,以及三种最新的扩增子序列变异(ASV)方法。我们首先比较了来自高生物量模拟社区的推断结果,以评估基准性能。然后,我们以一个模拟社区(仅在生物量方面有所变化的样本)制成的稀释系列为基准对这些方法进行了基准测试。通过每种方法推断出的ASV / OTU被归类为代表预期的社区,技术噪声或污染。利用高生物量数据,我们发现ASV方法具有良好的灵敏度和精密度,而其他方法则在一个区域或两个区域都受到影响。推断的污染仅占很小的比例。使用稀释系列时,无论所使用的推理方法如何,污染占来自推断社区的数据的比例越来越大。但是,对于ASV方法,推断的污染物与样品生物量之间的相关性最强,对于OTU方法,相关性最弱。因此,没有一种推理方法能够单独区分真实的群落序列和污染物序列,但是ASV方法提供了群落和污染物的最准确表征。重要信息微生物群落对人类健康有重要影响,但是确定其影响需要准确表征。当前的技术使微生物组序列数据比以往更易于访问。但是,用于分析这些数据的流行软件方法是基于与较旧的测序技术和较小的数据集一起开发的算法,因此可能不适用于现代的高通量数据集。此外,相对于高生物量环境,来自微生物稀少的环境的样品对群落特征提出了额外的挑战,这一问题通常被忽略。我们发现一类称为扩增子序列变异(ASV)方法的新型微生物组序列处理工具优于常规方法。在代表低生物量群落的样品中,样品污染成为重要的混杂因素,ASV方法提高的准确性可能允许对污染物进行更鲁棒的计算识别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号