首页> 外文期刊>BMC Genomics >SQUAT: a Sequencing Quality Assessment Tool for data quality assessments of genome assemblies
【24h】

SQUAT: a Sequencing Quality Assessment Tool for data quality assessments of genome assemblies

机译:Squat:用于基因组大会的数据质量评估的测序质量评估工具

获取原文
           

摘要

With the rapid increase in genome sequencing projects for non-model organisms, numerous genome assemblies are currently in progress or available as drafts, but not made available as satisfactory, usable genomes. Data quality assessment of genome assemblies is gaining importance not only for people who perform the assembly/re-assembly processes, but also for those who attempt to use assemblies as maps in downstream analyses. Recent studies of the quality control, quality evaluation/ assessment of genome assemblies have focused on either quality control of reads before assemblies or evaluation of the assemblies with respect to their contiguity and correctness. However, correctness assessment depends on a reference and is not applicable for de novo assembly projects. Hence, development of methods providing both post-assembly and pre-assembly quality assessment reports for examining the quality/correctness of de novo assemblies and the input reads is worth studying. We present SQUAT, an efficient tool for both pre-assembly and post-assembly quality assessment of de novo genome assemblies. The pre-assembly module of SQUAT computes quality statistics of reads and presents the analysis in a well-designed interface to visualize the distribution of high- and poor-quality reads in a portable HTML report. The post-assembly module of SQUAT provides read mapping analytics in an HTML format. We categorized reads into several groups including uniquely mapped reads, multiply mapped, unmapped reads; for uniquely mapped reads, we further categorized them into perfectly matched, with substitutions, containing clips, and the others. We carefully defined the poorly mapped (PM) reads into several groups to prevent the underestimation of unmapped reads; indeed, a high PM% would be a sign of a poor assembly that requires researchers' attention for further examination or improvements before using the assembly. Finally, we evaluate SQUAT with six datasets, including the genome assemblies for eel, worm, mushroom, and three bacteria. The results show that SQUAT reports provide useful information with details for assessing the quality of assemblies and reads. The SQUAT software with links to both its docker image and the on-line manual is freely available at https://github.com/luke831215/SQUAT .
机译:随着非模型生物的基因组测序项目的快速增加,目前众多基因组组件目前正在进行或作为草稿可用,但不能作为令人满意的,可用的基因组提供。基因组大会的数据质量评估不仅对执行组装/重新装配流程的人来说,还具有重要性,而且对那些试图使用组件作为下游分析中的地图的人来说,这也取得了重要性。最近对基因组组件的质量控制,质量评估/评估的研究专注于组装前的读取的质量控制,或者对它们的邻近性和正确性评估组件。但是,正确评估取决于参考,不适用于De Novo集会项目。因此,提供用于检查De Novo组件的质量/正确性和输入读取的组装后和预装质量评估报告的方法的开发值得研究。我们展示了蹲下,是德诺伊基因组组装的预组装和装配后质量评估的有效工具。 Squat预装配模块计算读取的质量统计信息,并在设计精心设计的界面中提出了分析,以可视化便携式HTML报告中的高且差质读数的分布。 Squat后组合模块以HTML格式提供读取映射分析。我们分类为几个组,包括唯一映射的读取,乘法映射,未映射的读取;对于唯一映射的读取,我们进一步将它们分为完全匹配的替换,包含剪辑和其他替换。我们仔细定义了映射不良(PM)读成几个小组以防止低估未映射的读数;实际上,高PM%将是一个糟糕的装配迹象,需要研究人员在使用大会之前进一步检查或改进。最后,我们用六个数据集评估蹲坐,包括鳗鱼,蠕虫,蘑菇和三种细菌的基因组组装。结果表明,Squat报告提供了有用的信息,详细说明了评估程序集和读取的质量。带有其Docker Image和在线手册的链接的Squat软件可在HTTPS://github.com/luke831215/squat上自由使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号