首页> 美国卫生研究院文献>Nucleic Acids Research >A novel conceptual approach to read-filtering in high-throughput amplicon sequencing studies
【2h】

A novel conceptual approach to read-filtering in high-throughput amplicon sequencing studies

机译:高通量扩增子测序研究中一种新颖的读取过滤概念方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Adequate read filtering is critical when processing high-throughput data in marker-gene-based studies. Sequencing errors can cause the mis-clustering of otherwise similar reads, artificially increasing the number of retrieved Operational Taxonomic Units (OTUs) and therefore leading to the overestimation of microbial diversity. Sequencing errors will also result in OTUs that are not accurate reconstructions of the original biological sequences. Herein we present the Poisson binomial filtering algorithm (PBF), which minimizes both problems by calculating the error-probability distribution of a sequence from its quality scores. In order to validate our method, we quality-filtered 37 publicly available datasets obtained by sequencing mock and environmental microbial communities with the Roche 454, Illumina MiSeq and IonTorrent PGM platforms, and compared our results to those obtained with previous approaches such as the ones included in mothur, QIIME and USEARCH. Our algorithm retained substantially more reads than its predecessors, while resulting in fewer and more accurate OTUs. This improved sensitiveness produced more faithful representations, both quantitatively and qualitatively, of the true microbial diversity present in the studied samples. Furthermore, the method introduced in this work is computationally inexpensive and can be readily applied in conjunction with any existent analysis pipeline.
机译:在基于标记物基因的研究中处理高通量数据时,充分的读取过滤至关重要。测序错误可能会导致其他类似读段的聚类错误,从而人为地增加了检索到的操作分类单位(OTU)的数量,因此导致对微生物多样性的高估。测序错误还将导致OTU不能正确重建原始生物序列。在这里,我们提出了泊松二项式滤波算法(PBF),该算法通过根据序列的质量得分计算序列的错误概率分布来最小化两个问题。为了验证我们的方法,我们对通过Roche 454,Illumina MiSeq和IonTorrent PGM平台对模拟和环境微生物群落进行测序而获得的37个可公开获得的数据集进行了质量过滤,并将我们的结果与通过以前的方法(例如包含在内)获得的结果进行了比较在mothur,QIIME和USEARCH。与以前的算法相比,我们的算法保留了更多的读取次数,同时导致更少,更准确的OTU。这种提高的敏感性从数量和质量上更忠实地反映了研究样品中存在的真实微生物多样性。此外,这项工作中介绍的方法在计算上不昂贵,并且可以很容易地与任何现有的分析管道一起应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号