首页> 外文期刊>mSystems >SHI7 Is a Self-Learning Pipeline for Multipurpose Short-Read DNA Quality Control
【24h】

SHI7 Is a Self-Learning Pipeline for Multipurpose Short-Read DNA Quality Control

机译:SHI7是一种自学管道,用于多功能短读取DNA质量控制

获取原文
获取外文期刊封面目录资料

摘要

Next-generation sequencing technology is of great importance for many biological disciplines; however, due to technical and biological limitations, the short DNA sequences produced by modern sequencers require numerous quality control (QC) measures to reduce errors, remove technical contaminants, or merge paired-end reads together into longer or higher-quality contigs. Many tools for each step exist, but choosing the appropriate methods and usage parameters can be challenging because the parameterization of each step depends on the particularities of the sequencing technology used, the type of samples being analyzed, and the stochasticity of the instrumentation and sample preparation. Furthermore, end users may not know all of the relevant information about how their data were generated, such as the expected overlap for paired-end sequences or type of adaptors used to make informed choices. This increasing complexity and nuance demand a pipeline that combines existing steps together in a user-friendly way and, when possible, learns reasonable quality parameters from the data automatically. We propose a user-friendly quality control pipeline called SHI7 (canonically pronounced “shizen”), which aims to simplify quality control of short-read data for the end user by predicting presence and/or type of common sequencing adaptors, what quality scores to trim, whether the data set is shotgun or amplicon sequencing, whether reads are paired end or single end, and whether pairs are stitchable, including the expected amount of pair overlap. We hope that SHI7 will make it easier for all researchers, expert and novice alike, to follow reasonable practices for short-read data quality control. IMPORTANCE Quality control of high-throughput DNA sequencing data is an important but sometimes laborious task requiring background knowledge of the sequencing protocol used (such as adaptor type, sequencing technology, insert size/stitchability, paired-endedness, etc.). Quality control protocols typically require applying this background knowledge to selecting and executing numerous quality control steps with the appropriate parameters, which is especially difficult when working with public data or data from collaborators who use different protocols. We have created a streamlined quality control pipeline intended to substantially simplify the process of DNA quality control from raw machine output files to actionable sequence data. In contrast to other methods, our proposed pipeline is easy to install and use and attempts to learn the necessary parameters from the data automatically with a single command.
机译:下一代测序技术对许多生物学学科非常重要;然而,由于技术和生物限制,由现代序列仪产生的短DNA序列需要许多质量控制(QC)措施来减少误差,去除技术污染物,或合并配对端一起读入更长或更高质量的体积读数。每个步骤的许多工具都存在,但选择适当的方法和使用参数可能是具有挑战性的,因为每个步骤的参数化取决于所使用的测序技术的特殊性,所使用的样本类型以及仪器的随机性和样品制备的随机性。此外,最终用户可能不知道关于如何生成其数据的所有相关信息,例如用于配对端序列的预期重叠或用于进行明智选择的适配器类型。这种越来越复杂性和细微差别需要一个管道以用户友好的方式将现有步骤组合在一起,并且在可能的情况下,从数据中自动学习合理的质量参数。我们提出了一种名为SHI7的用户友好的质量控制管道(Cononaly发音为“shizen”),其目的是通过预测常见测序适配器的存在和/或类型来简化最终用户的短读数据的质量控制,哪些质量得分修剪,是否数据集是捕枪或放大器测序,无论是配对的端还是单端,以及对是否符号,包括对重叠的预期量。我们希望Shi7将使所有研究人员,专家和新手更容易,以遵循短读数据质量控制的合理实践。高通量DNA测序数据的重要性质量控制是一个重要的但有时费力的任务,需要使用所用测序协议的背景知识(例如适配器类型,测序技术,插入尺寸/缝隙性,配对结束等)。质量控制协议通常需要将此背景知识应用于使用适当的参数选择和执行许多质量控制步骤,这些参数在使用不同协议的协作者的公共数据或数据时尤其困难。我们已经创建了一种简化的质量控制管道,旨在基本上简化了从原始机器输出文件到可操作序列数据的DNA质量控制过程。与其他方法相比,我们提出的管道易于安装和使用,并尝试使用单个命令自动从数据中学习必要的参数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号