首页> 外文期刊>Source Code for Biology Medicine >Software for pre-processing Illumina next-generation sequencing short read sequences
【24h】

Software for pre-processing Illumina next-generation sequencing short read sequences

机译:用于对Illumina下一代测序短读序列进行预处理的软件

获取原文
       

摘要

Background When compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been developed for quality control and pre-processing of NGS data, none of them provide flexible and comprehensive trimming options in conjunction with parallel processing to expedite pre-processing of large NGS datasets. Methods We developed ngsShoRT (next-generation sequencing Short Reads Trimmer), a flexible and comprehensive open-source software package written in Perl that provides a set of algorithms commonly used for pre-processing NGS short read sequences. We compared the features and performance of ngsShoRT with existing tools: CutAdapt, NGS QC Toolkit and Trimmomatic. We also compared the effects of using pre-processed short read sequences generated by different algorithms on de novo and reference-based assembly for three different genomes: Caenorhabditis elegans, Saccharomyces cerevisiae S288c, and Escherichia coli O157 H7. Results Several combinations of ngsShoRT algorithms were tested on publicly available Illumina GA II, HiSeq 2000, and MiSeq eukaryotic and bacteria genomic short read sequences with the focus on removing sequencing artifacts and low-quality reads and/or bases. Our results show that across three organisms and three sequencing platforms, trimming improved the mean quality scores of trimmed sequences. Using trimmed sequences for de novo and reference-based assembly improved assembly quality as well as assembler performance. In general, ngsShoRT outperformed comparable trimming tools in terms of trimming speed and improvement of de novo and reference-based assembly as measured by assembly contiguity and correctness. Conclusions Trimming of short read sequences can improve the quality of de novo and reference-based assembly and assembler performance. The parallel processing capability of ngsShoRT reduces trimming time and improves the memory efficiency when dealing with large datasets. We recommend combining sequencing artifacts removal, and quality score based read filtering and base trimming as the most consistent method for improving sequence quality and downstream assemblies. ngsShoRT source code, user guide and tutorial are available at http://research.bioinformatics.udel.edu/genomicsgsShoRT/ webcite. ngsShoRT can be incorporated as a pre-processing step in genome and transcriptome assembly projects.
机译:背景技术与Sanger测序技术相比,下一代测序(NGS)技术受较短的序列读取长度,较高的碱基检出错误率,覆盖范围不统一以及特定平台的测序工件所困扰。这些特性降低了其下游分析的质量,例如通过引入可能导致错误数据解释的测序工件和错误,从头开始并进行基于参考的组装。尽管已经开发了许多工具来进行NGS数据的质量控制和预处理,但它们均未提供与并行处理相结合的灵活而全面的修整选项,以加快大型NGS数据集的预处理。方法我们开发了ngsShoRT(下一代测序Short Reads Trimmer),这是一种用Perl编写的灵活而全面的开源软件包,它提供了一组用于预处理NGS短读序列的算法。我们将ngsShoRT的功能和性能与现有工具(CutAdapt,NGS QC Toolkit和Trimmomatic)进行了比较。我们还比较了使用不同算法生成的预处理短读序列对三种不同基因组从头开始和基于参考的组装的影响:秀丽隐杆线虫,酿酒酵母S288c和大肠杆菌O157 H7。结果在公开可用的Illumina GA II,HiSeq 2000和MiSeq真核和细菌基因组短阅读序列上测试了ngsShoRT算法的几种组合,重点是去除测序伪像以及低质量的阅读和/或碱基。我们的结果表明,在三种生物和三种测序平台上,修剪可提高修剪序列的平均质量得分。在从头和基于参考的装配中使用经过整理的顺序可以提高装配质量以及装配性能。通常,就装配速度和正确性而言,ngsShoRT在修剪速度和从头开始以及基于参考的装配的改进方面均优于同类修剪工具。结论短阅读序列的修饰可以提高从头和基于参考的组装和组装程序的质量。 ngsShoRT的并行处理能力可减少修剪时间并提高处理大型数据集时的存储效率。我们建议将测序伪影去除,基于质量得分的读取过滤和碱基修整结合起来,作为提高序列质量和下游装配的最一致的方法。 ngsShoRT源代码,用户指南和教程可在http://research.bioinformatics.udel.edu/genomicsgsShoRT/网站上找到。 ngsShoRT可以作为预处理步骤并入基因组和转录组组装项目中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号