首页> 外文期刊>Bioinformatics >Fast and SNP-tolerant detection of complex variants and splicing in short reads
【24h】

Fast and SNP-tolerant detection of complex variants and splicing in short reads

机译:快速且SNP耐受的复杂变异检测和短读段剪接

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Next-generation sequencing captures sequence differences in reads relative to a reference genome or transcriptome, including splicing events and complex variants involving multiple mismatches and long indels. We present computational methods for fast detection of complex variants and splicing in short reads, based on a successively constrained search process of merging and filtering position lists from a genomic index. Our methods are implemented in GSNAP (Genomic Short-read Nucleotide Alignment Program), which can align both single-and paired-end reads as short as 14 nt and of arbitrarily long length. It can detect short- and long-distance splicing, including interchromosomal splicing, in individual reads, using probabilistic models or a database of known splice sites. Our program also permits SNP-tolerant alignment to a reference space of all possible combinations of major and minor alleles, and can align reads from bisulfite-treated DNA for the study of methylation state.Results: In comparison testing, GSNAP has speeds comparable to existing programs, especially in reads of >= 70 nt and is fastest in detecting complex variants with four or more mismatches or insertions of 1-9 nt and deletions of 1-30 nt. Although SNP tolerance does not increase alignment yield substantially, it affects alignment results in 7-8% of transcriptional reads, typically by revealing alternate genomic mappings for a read. Simulations of bisulfite-converted DNA show a decrease in identifying genomic positions uniquely in 6% of 36 nt reads and 3% of 70 nt reads. Availability: Source code in C and utility programs in Perl are freely available for download as part of the GMAP package at http://share.gene.com/gmap.
机译:动机:下一代测序可捕获相对于参考基因组或转录组的读段中的序列差异,包括剪接事件和涉及多个错配和长插入缺失的复杂变异。我们基于合并和过滤来自基因组索引的位置列表的连续约束搜索过程,提出了用于快速检测复杂变异和短读段拼接的计算方法。我们的方法在GSNAP(基因组短读核苷酸比对程序)中实现,该程序可以比对短至14 nt且长度任意长的单端和成对读段。它可以使用概率模型或已知剪接位点的数据库在单个读取中检测短距离和长距离剪接,包括染色体间剪接。我们的程序还允许将SNP耐受性与主要等位基因和次要等位基因的所有可能组合的参考空间比对,并且可以比对亚硫酸氢盐处理过的DNA的读数以进行甲基化状态的研究。程序,特别是在读取> = 70 nt的程序中,并且在检测具有四个或更多个错配或1-9 nt插入和1-30 nt缺失的复杂变体时最快。尽管SNP耐受性不会显着提高比对产量,但通常会通过揭示读取的替代基因组图谱来影响7-8%的转录读取中的比对结果。亚硫酸氢盐转化的DNA的模拟显示,在6%的36 nt读段和3%的70 nt读段中,唯一识别基因组位置的减少。可用性:Perl中C语言的源代码和实用程序可作为GMAP软件包的一部分免费下载,网址为http://share.gene.com/gmap。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号