...
首页> 外文期刊>BMC Bioinformatics >Rapid evaluation and quality control of next generation sequencing data with FaQCs
【24h】

Rapid evaluation and quality control of next generation sequencing data with FaQCs

机译:使用FAQCS的下一代测序数据的快速评估和质量控制

获取原文

摘要

Background Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform’s sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Results Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly process large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. Conclusion FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.
机译:背景技术下一代测序(NGS)技术,其并行化测序过程并在单个测序运行中产生数千次数百万,或甚至数百万次序列,具有革命性的基因组和遗传研究。由于任何平台的测序化学的变幻觉,实验处理,机器故障等,测序读取的质量永远不会完美,并且通常随着读取的延伸而下降。这些误差总是影响下游分析/应用,因此应尽早确定以减轻任何无法预料的效果。结果在这里我们提出了一种新的FASTQ质量控制软件(FAIQCS),可以快速处理大量数据,并提高了先前的解决方案,以监控质量,并从排序运行中消除差的质量数据。通过算法和并行处理解决方案优化了存储速度和存储所有所需信息的存储空间。修剪输出并排与原始数据相比是自动PDF输出的一部分。我们展示该工具如何通过提供一些实施例来帮助数据分析,包括增加募集引用的读取百分比,改善单个核苷酸多态性鉴定以及De Novo序列组装度量。结论FAQCS将当前可用应用程序的多种功能与单一用户友好的过程结合在一起,包括其他独特功能,例如过滤PHIX控制序列,FASTQ格式的转换和多线程。原始数据和修剪摘要在各种图形和报告中报告,提供了一种简单的方法来做数据质量控制和保证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号