...
首页> 外文期刊>BMC Bioinformatics >4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information
【24h】

4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information

机译:4Pipe4 – 454数据分析管道,用于在没有参考序列或菌株信息的数据集中进行SNP检测

获取原文

摘要

Next-generation sequencing datasets are becoming more frequent, and their use in population studies is becoming widespread. For non-model species, without a reference genome, it is possible from a panel of individuals to identify a set of SNPs that can be used for further population genotyping. However the lack of a reference genome to which the sequenced data could be compared makes the finding of SNPs more troublesome. Additionally when the data sources (strains) are not identified (e.g. in datasets of pooled individuals), the problem of finding reliable variation in these datasets can become much more difficult due to the lack of specialized software for this specific task. Here we describe 4Pipe4, a 454 data analysis pipeline particularly focused on SNP detection when no reference or strain information is available. It uses a command line interface to automatically call other programs, parse their outputs and summarize the results. The variation detection routine is built-in in the program itself. Despite being optimized for SNP mining in 454 EST data, it is flexible enough to automate the analysis of genomic data or even data from other NGS technologies. 4Pipe4 will output several HTML formatted reports with metrics on many of the most common assembly values, as well as on all the variation found. There is also a module available for finding putative SSRs in the analysed datasets. This program can be especially useful for researchers that have 454 datasets of a panel of pooled individuals and want to discover and characterize SNPs for subsequent individual genotyping with customized genotyping arrays. In comparison with other SNP detection approaches, 4Pipe4 showed the best validation ratio, retrieving a smaller number of SNPs but with a considerably lower false positive rate than other methods. 4Pipe4’s source code is available at https://github.com/StuntsPT/4Pipe4 .
机译:下一代测序数据集变得越来越频繁,它们在人口研究中的使用也越来越广泛。对于没有参考基因组的非模型物种,有可能从一组个人中识别出一套可用于进一步的种群基因分型的SNP。然而,缺乏可以与测序数据进行比较的参考基因组,使得寻找SNP变得更加麻烦。另外,当未识别数据源(菌株)时(例如在合并的个体的数据集中),由于缺少用于该特定任务的专用软件,在这些数据集中寻找可靠的变异的问题可能变得更加困难。在这里,我们描述4Pipe4,这是一条454数据分析管道,专门用于没有参考或应变信息的情况下的SNP检测。它使用命令行界面来自动调用其他程序,解析它们的输出并汇总结果。程序本身内置了变化检测例程。尽管针对454 EST数据中的SNP挖掘进行了优化,但它具有足够的灵活性以自动化基因组数据甚至其他NGS技术数据的分析。 4Pipe4将输出多个HTML格式的报告,其中包含许多最常见的程序集值以及找到的所有变体的度量。还有一个模块可用于在分析的数据集中查找假定的SSR。该程序对于拥有454个合并个体的数据集并想要发现和表征SNP以便随后通过定制基因分型阵列进行个体基因分型的研究人员特别有用。与其他SNP检测方法相比,4Pipe4的验证率最高,检索到的SNP数量较少,但假阳性率比其他方法低得多。 4Pipe4的源代码可从https://github.com/StuntsPT/4Pipe4获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号