首页> 外文期刊>BMC Bioinformatics >FQStat: a parallel architecture for very high-speed assessment of sequencing quality metrics
【24h】

FQStat: a parallel architecture for very high-speed assessment of sequencing quality metrics

机译:FQStat:一种并行架构,可非常快速地评估测序质量指标

获取原文
           

摘要

High throughput DNA/RNA sequencing has revolutionized biological and clinical research. Sequencing is widely used, and generates very large amounts of data, mainly due to reduced cost and advanced technologies. Quickly assessing the quality of giga-to-tera base levels of sequencing data has become a routine but important task. Identification and elimination of low-quality sequence data is crucial for reliability of downstream analysis results. There is a need for a high-speed tool that uses optimized parallel programming for batch processing and simply gauges the quality of sequencing data from multiple datasets independent of any other processing steps. FQStat is a stand-alone, platform-independent software tool that assesses the quality of FASTQ files using parallel programming. Based on the machine architecture and input data, FQStat automatically determines the number of cores and the amount of memory to be allocated per file for optimum performance. Our results indicate that in a core-limited case, core assignment overhead exceeds the benefit of additional cores. In a core-unlimited case, there is a saturation point reached in performance by increasingly assigning additional cores per file. We also show that memory allocation per file has a lower priority in performance when compared to the allocation of cores. FQStat’s output is summarized in HTML web page, tab-delimited text file, and high-resolution image formats. FQStat calculates and plots read count, read length, quality score, and high-quality base statistics. FQStat identifies and marks low-quality sequencing data to suggest removal from downstream analysis. We applied FQStat on real sequencing data to optimize performance and to demonstrate its capabilities. We also compared FQStat’s performance to similar quality control (QC) tools that utilize parallel programming and attained improvements in run time. FQStat is a user-friendly tool with a graphical interface that employs a parallel programming architecture and automatically optimizes its performance to generate quality control statistics for sequencing data. Unlike existing tools, these statistics are calculated for multiple datasets and separately at the “lane,” “sample,” and “experiment” level to identify subsets of the samples with low quality, thereby preventing the loss of complete samples when reliable data can still be obtained.
机译:高通量DNA / RNA测序彻底改变了生物学和临床研究。排序被广泛使用,并且产生大量数据,这主要是由于降低了成本和采用了先进的技术。快速评估测序数据的千兆位至千兆位基础水平的质量已成为一项常规但重要的任务。鉴定和消除低质量序列数据对于下游分析结果的可靠性至关重要。需要一种高速工具,该工具使用优化的并行编程进行批处理,并简单地从多个数据集中测量测序数据的质量,而与任何其他处理步骤无关。 FQStat是一个独立于平台的独立软件工具,它使用并行编程来评估FASTQ文件的质量。根据机器体系结构和输入数据,FQStat自动确定内核数和每个文件要分配的内存量,以实现最佳性能。我们的结果表明,在内核受限的情况下,内核分配开销超过了其他内核的好处。在无核心限制的情况下,通过为每个文件逐渐分配更多的核心,可以达到性能的饱和点。我们还表明,与内核分配相比,每个文件的内存分配在性能上具有较低的优先级。 FQStat的输出以HTML网页,制表符分隔的文本文件和高分辨率图像格式进行了汇总。 FQStat计算并绘制读取计数,读取长度,质量得分和高质量基本统计信息。 FQStat识别并标记低质量的测序数据,以建议从下游分析中删除。我们将FQStat应用于实际测序数据,以优化性能并展示其功能。我们还将FQStat的性能与类似的质量控制(QC)工具进行了比较,这些工具利用了并行编程并在运行时间方面取得了改善。 FQStat是一个用户友好的工具,具有图形界面,该界面采用并行编程体系结构,并自动优化其性能以生成用于排序数据的质量控制统计信息。与现有工具不同,这些统计信息是针对多个数据集计算的,并分别在“泳道”,“样本”和“实验”级别上进行计算,以识别低质量的样本子集,从而在可靠数据仍然可以使用时防止丢失完整样本获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号