...
首页> 外文期刊>Microbiome >Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies
【24h】

Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies

机译:大规模基准测试揭示了在微生物组研究中使用的16S RRNA基因扩增子数据分析方法中的虚假发现和计数转化敏感性

获取原文

摘要

There is an immense scientific interest in the human microbiome and its effects on human physiology, health, and disease. A common approach for examining bacterial communities is high-throughput sequencing of 16S rRNA gene hypervariable regions, aggregating sequence-similar amplicons into operational taxonomic units (OTUs). Strategies for detecting differential relative abundance of OTUs between sample conditions include classical statistical approaches as well as a plethora of newer methods, many borrowing from the related field of RNA-seq analysis. This effort is complicated by unique data characteristics, including sparsity, sequencing depth variation, and nonconformity of read counts to theoretical distributions, which is often exacerbated by exploratory and/or unbalanced study designs. Here, we assess the robustness of available methods for (1) inference in differential relative abundance analysis and (2) beta-diversity-based sample separation, using a rigorous benchmarking framework based on large clinical 16S microbiome datasets from different sources. Running more than 380,000 full differential relative abundance tests on real datasets with permuted case/control assignments and in silico-spiked OTUs, we identify large differences in method performance on a range of parameters, including false positive rates, sensitivity to sparsity and case/control balances, and spike-in retrieval rate. In large datasets, methods with the highest false positive rates also tend to have the best detection power. For beta-diversity-based sample separation, we show that library size normalization has very little effect and that the distance metric is the most important factor in terms of separation power. Our results, generalizable to datasets from different sequencing platforms, demonstrate how the choice of method considerably affects analysis outcome. Here, we give recommendations for tools that exhibit low false positive rates, have good retrieval power across effect sizes and case/control proportions, and have low sparsity bias. Result output from some commonly used methods should be interpreted with caution. We provide an easily extensible framework for benchmarking of new methods and future microbiome datasets.
机译:对人类微生物组有巨大的科学兴趣及其对人体生理学,健康和疾病的影响。用于检查细菌群落的常见方法是16S rRNA基因高变区域的高通量测序,将序列相似的扩增子聚集成运作分类单位(OTU)。检测样品条件之间的差异相对丰度的策略包括古典统计方法以及多种较新方法,许多从RNA-SEQ分析的相关领域借用。这种努力通过独特的数据特性,包括稀疏性,测序深度变化和读数的不合格,读取计数与理论分布的唯一数据特征复杂化,这通常通过探索性和/或不平衡的研究设计而加剧。在这里,我们使用基于来自不同来源的大型临床16S微生物组数据集的严格基准测试框架来评估(1)在差分相对丰度分析中的可用方法和基于β-多样性的样品分离的鲁棒性。在具有允许的箱/控制分配和Silico-Spiked Otus的实际数据集上运行超过380,000个完全差分相对丰度测试,我们在一系列参数上确定了方法性能的巨大差异,包括假阳性率,对稀缺性和案例/控制的敏感性平衡和飙升的检索率。在大型数据集中,具有最高误差率的方法也倾向于具有最佳的检测功率。对于基于β-多样性的样本分离,我们表明图书馆尺寸标准化的效果很小,距离度量是分离功率方面最重要的因素。我们的结果,可以从不同测序平台上的数据集概括,证明了方法选择如何影响分析结果。在这里,我们向表现出低误率的工具提供建议,跨越效果尺寸和案例/控制比例具有良好的检索力,并且具有低稀疏偏差。一些常用方法的结果输出应谨慎解释。我们为新方法和未来微生物组数据集提供了一种易于扩展的框架。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号