...
首页> 外文期刊>BMC Bioinformatics >Joining Illumina paired-end reads for classifying phylogenetic marker sequences
【24h】

Joining Illumina paired-end reads for classifying phylogenetic marker sequences

机译:加入Illumina成对读取用于分类系统发育标记序列

获取原文

摘要

Illumina sequencing of a marker gene is popular in metagenomic studies. However, Illumina paired-end (PE) reads sometimes cannot be merged into single reads for subsequent analysis. When mergeable PE reads are limited, one can simply use only first reads for taxonomy annotation, but that wastes information in the second reads. Presumably, including second reads should improve taxonomy annotation. However, a rigorous investigation of how best to do this and how much can be gained has not been reported. We evaluated two methods of joining as opposed to merging PE reads into single reads for taxonomy annotation using simulated data with sequencing errors. Our rigorous evaluation involved several top classifiers (RDP classifier, SINTAX, and two alignment-based methods) and realistic benchmark datasets. For most classifiers, read joining ameliorated the impact of sequencing errors and improved the accuracy of taxonomy predictions. For alignment-based top-hit classifiers, rearranging the reference sequences is recommended to avoid improper alignments of joined reads. For word-counting classifiers, joined reads could be compared to the original reference for classification. We also applied read joining to our own real MiSeq PE data of nasal microbiota of asthmatic children. Before joining, trimming low quality bases was necessary for optimizing taxonomy annotation and sequence clustering. We then showed that read joining increased the amount of effective data for taxonomy annotation. Using these joined trimmed reads, we were able to identify two promising bacterial genera that might be associated with asthma exacerbation. When mergeable PE reads are limited, joining them into single reads for taxonomy annotation is always recommended. Reference sequences may need to be rearranged accordingly depending on the classifier. Read joining also relaxes the constraint on primer selection, and thus may unleash the full capacity of Illumina PE data for taxonomy annotation. Our work provides guidance for fully utilizing PE data of a marker gene when mergeable reads are limited.
机译:标志物基因的illumina测序在偏见研究中受欢迎。但是,Illumina成对端(PE)读数有时不能合并为单一读取以进行后续分析。当合并的PE读数是有限的时,可以只使用首先使用的分类作用注释,但在第二读读数中浪费信息。据推测,包括第二次读取应该改善分类作用注释。但是,对如何最好地进行的严格调查以及尚未报告获得多少。我们评估了两种连接方法,而不是合并PE读入单一读取的分类编辑,用于使用具有测序误差的模拟数据。我们严格的评估涉及几个顶级分类器(RDP分类器,Sintax和两个基于对齐的方法)和现实的基准数据集。对于大多数分类器来说,阅读加入改善了测序误差的影响,提高了分类学预测的准确性。对于基于对齐的顶部命中分类器,建议重新排列参考序列以避免连接读取的不当对齐。对于单词计数分类器,可以将加入的读取与分类的原始参考进行比较。我们还应用读取加入我们自己的哮喘儿童鼻微生物群的真实MISEQ PE数据。在加入之前,为优化分类作者注释和序列聚类,需要修剪低质量的基础。然后,我们显示读取加入增加了分类作者的有效数据量。使用这些连接的修剪读数,我们能够鉴定两个可能与哮喘恶化相关的有前途的细菌属。当合并的PE读数有限时,始终建议使用将它们加入单一读取以进行分类注释。可以根据分类器相应地重新排列参考序列。读加入还放宽了底漆选择的约束,因此可能释放出illumina PE数据的全部容量进行分类作用。当合并读数有限时,我们的工作提供了用于充分利用标记基因的PE数据的指导。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号