首页> 外文期刊>Microbiome >A framework for assessing 16S rRNA marker-gene survey data analysis methods using mixtures.
【24h】

A framework for assessing 16S rRNA marker-gene survey data analysis methods using mixtures.

机译:评估16S rRNA标记-基因调查数据分析方法的框架使用混合物。

获取原文
       

摘要

BACKGROUND:There are a variety of bioinformatic pipelines and downstream analysis methods for analyzing 16S rRNA marker-gene surveys. However, appropriate assessment datasets and metrics are needed as there is limited guidance to decide between available analysis methods. Mixtures of environmental samples are useful for assessing analysis methods as one can evaluate methods based on calculated expected values using unmixed sample measurements and the mixture design. Previous studies have used mixtures of environmental samples to assess other sequencing methods such as RNAseq. But no studies have used mixtures of environmental to assess 16S rRNA sequencing.RESULTS:We developed a framework for assessing 16S rRNA sequencing analysis methods which utilizes a novel two-sample titration mixture dataset and metrics to evaluate qualitative and quantitative characteristics of count tables. Our qualitative assessment evaluates feature presence/absence exploiting features only present in unmixed samples or titrations by testing if random sampling can account for their observed relative abundance. Our quantitative assessment evaluates feature relative and differential abundance by comparing observed and expected values. We demonstrated the framework by evaluating count tables generated with three commonly used bioinformatic pipelines: (i) DADA2 a sequence inference method, (ii) Mothur a de novo clustering method, and (iii) QIIME an open-reference clustering method. The qualitative assessment results indicated that the majority of Mothur and QIIME features only present in unmixed samples or titrations were accounted for by random sampling alone, but this was not the case for DADA2 features. Combined with count table sparsity (proportion of zero-valued cells in a count table), these results indicate DADA2 has a higher false-negative rate whereas Mothur and QIIME have higher false-positive rates. The quantitative assessment results indicated the observed relative abundance and differential abundance values were consistent with expected values for all three pipelines.CONCLUSIONS:We developed a novel framework for assessing 16S rRNA marker-gene survey methods and demonstrated the framework by evaluating count tables generated with three bioinformatic pipelines. This framework is a valuable community resource for assessing 16S rRNA marker-gene survey bioinformatic methods and will help scientists identify appropriate analysis methods for their marker-gene surveys.
机译:背景:有多种生物信息管道和下游分析方法,用于分析16S rRNA标记-基因调查。但是,需要适当的评估数据集和指标,因为有限的指导是在可用分析方法之间决定的指导。环境样品的混合物可用于评估分析方法,因为可以使用未固定的样品测量和混合设计来评估基于计算的预期值的方法。以前的研究使用了环境样品的混合物来评估其他测序方法,如RNA喃咯。但没有研究使用环境的混合物来评估16S rRNA测序。结果:我们开发了一种评估16S rRNA测序分析方法的框架,该方法利用新型的两个样品滴定混合物数据集和度量来评估计数表的定性和定量特征。我们的定性评估评估功能/缺席仅在未固定的样本或滴定中呈现的功能,如果随机采样可以考虑其观察到的相对丰富,则只能通过测试。我们的定量评估通过比较观察和预期值来评估特征相对和差异丰度。我们通过评估用三种常用的生物信息管道生成的计数表来展示框架:(i)DADA2 A序列推断方法,(ii)Mothur A de Novo聚类方法,和(iii)qiime一个开放式参考聚类方法。定性评估结果表明,大多数Mothur和Qiime特征仅呈现在解密样本或滴定中,通过单独随机抽样来占据,但Dada2功能并非如此。结合计数表稀疏性(计数表中零值细胞的比例),这些结果表明,Dada2具有更高的假负速率,而Mothur和Qiime具有更高的假阳性率。定量评估结果表明,观察到的相对丰度和差异丰度值与所有三个管道的预期值一致。结论:我们开发了一种用于评估16S RRNA标记-基因调查方法的新框架,并通过评估三个产生的计数表来证明框架生物信息化管道。该框架是评估16S rRNA标记基因调查生物信息方法的有价值的社区资源,并有助于科学家确定其标记-基因调查的适当分析方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号