首页> 外文期刊>PLoS Computational Biology >The Effects of Alignment Quality, Distance Calculation Method, Sequence Filtering, and Region on the Analysis of 16S rRNA Gene-Based Studies
【24h】

The Effects of Alignment Quality, Distance Calculation Method, Sequence Filtering, and Region on the Analysis of 16S rRNA Gene-Based Studies

机译:序列比对质量,距离计算方法,序列过滤和区域对基于16S rRNA基因的研究分析的影响

获取原文
           

摘要

Pyrosequencing of PCR-amplified fragments that target variable regions within the 16S rRNA gene has quickly become a powerful method for analyzing the membership and structure of microbial communities. This approach has revealed and introduced questions that were not fully appreciated by those carrying out traditional Sanger sequencing-based methods. These include the effects of alignment quality, the best method of calculating pairwise genetic distances for 16S rRNA genes, whether it is appropriate to filter variable regions, and how the choice of variable region relates to the genetic diversity observed in full-length sequences. I used a diverse collection of 13,501 high-quality full-length sequences to assess each of these questions. First, alignment quality had a significant impact on distance values and downstream analyses. Specifically, the greengenes alignment, which does a poor job of aligning variable regions, predicted higher genetic diversity, richness, and phylogenetic diversity than the SILVA and RDP-based alignments. Second, the effect of different gap treatments in determining pairwise genetic distances was strongly affected by the variation in sequence length for a region; however, the effect of different calculation methods was subtle when determining the sample's richness or phylogenetic diversity for a region. Third, applying a sequence mask to remove variable positions had a profound impact on genetic distances by muting the observed richness and phylogenetic diversity. Finally, the genetic distances calculated for each of the variable regions did a poor job of correlating with the full-length gene. Thus, while it is tempting to apply traditional cutoff levels derived for full-length sequences to these shorter sequences, it is not advisable. Analysis of β-diversity metrics showed that each of these factors can have a significant impact on the comparison of community membership and structure. Taken together, these results urge caution in the design and interpretation of analyses using pyrosequencing data.
机译:靶向16S rRNA基因内可变区的PCR扩增片段的焦磷酸测序已迅速成为分析微生物群落组成和结构的有力方法。这种方法揭示并引入了一些问题,而这些问题是执行传统的基于Sanger测序的方法的人们没有完全意识到的。这些因素包括比对质量的影响,计算16S rRNA基因的成对遗传距离的最佳方法,是否适合过滤可变区以及可变区的选择与全长序列中观察到的遗传多样性之间的关系。我使用了13,501个高质量全长序列的多样化集合来评估每个问题。首先,对准质量对距离值和下游分析有重大影响。具体而言,比起基于SILVA和RDP的比对,对对齐可变区的工作做得不好的绿色基因比对预测了更高的遗传多样性,丰富性和系统发育多样性。第二,不同间隙处理在确定成对遗传距离中的作用受到一个区域序列长度变化的强烈影响。但是,在确定某个区域的样本丰富度或系统发育多样性时,不同计算方法的影响微妙。第三,通过屏蔽观察到的丰富性和系统发育多样性,应用序列掩码去除可变位置对遗传距离产生了深远影响。最后,为每个可变区计算的遗传距离在与全长基因相关性方面做得很差。因此,尽管试图将为全长序列推导的传统截止水平应用于这些较短的序列,却是不可取的。对β多样性指标的分析表明,这些因素中的每一个都会对社区成员和结构的比较产生重大影响。综上所述,这些结果在使用焦磷酸测序数据进行分析的设计和解释时应谨慎行事。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号