首页> 美国卫生研究院文献>Bioinformatics >Fast and SNP-tolerant detection of complex variants and splicing in short reads
【2h】

Fast and SNP-tolerant detection of complex variants and splicing in short reads

机译:快速且SNP耐受的复杂变异和短读段剪接

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: Next-generation sequencing captures sequence differences in reads relative to a reference genome or transcriptome, including splicing events and complex variants involving multiple mismatches and long indels. We present computational methods for fast detection of complex variants and splicing in short reads, based on a successively constrained search process of merging and filtering position lists from a genomic index. Our methods are implemented in GSNAP (Genomic Short-read Nucleotide Alignment Program), which can align both single- and paired-end reads as short as 14 nt and of arbitrarily long length. It can detect short- and long-distance splicing, including interchromosomal splicing, in individual reads, using probabilistic models or a database of known splice sites. Our program also permits SNP-tolerant alignment to a reference space of all possible combinations of major and minor alleles, and can align reads from bisulfite-treated DNA for the study of methylation state.>Results: In comparison testing, GSNAP has speeds comparable to existing programs, especially in reads of ≥70 nt and is fastest in detecting complex variants with four or more mismatches or insertions of 1–9 nt and deletions of 1–30 nt. Although SNP tolerance does not increase alignment yield substantially, it affects alignment results in 7–8% of transcriptional reads, typically by revealing alternate genomic mappings for a read. Simulations of bisulfite-converted DNA show a decrease in identifying genomic positions uniquely in 6% of 36 nt reads and 3% of 70 nt reads.>Availability: Source code in C and utility programs in Perl are freely available for download as part of the GMAP package at .>Contact:
机译:>动机:下一代测序可捕获相对于参考基因组或转录组的读段中的序列差异,包括剪接事件和涉及多个错配和长插入缺失的复杂变异。我们基于合并和过滤基因组索引的位置列表的连续约束搜索过程,提出了用于快速检测复杂变异和短读入的计算方法。我们的方法在GSNAP(基因组短读核苷酸比对程序)中实施,该程序可以比对短至14 nt且长度任意长的单端和成对读段。它可以使用概率模型或已知剪接位点的数据库在单个读取中检测短距离和长距离剪接,包括染色体间剪接。我们的程序还允许将SNP耐受性与主要等位基因和次要等位基因的所有可能组合的参考空间比对,并且可以比对亚硫酸氢盐处理的DNA的读数以进行甲基化状态的研究。>结果: ,GSNAP的速度可与现有程序媲美,尤其是在读取值≥70 nt的情况下,并且在检测具有四个或更多个错配或插入1–9 nt且缺失1–30 nt的复杂变体方面最快。尽管SNP耐受性并没有显着提高比对产量,但通常会通过揭示读取的替代基因组图谱来影响7-8%的转录阅读中的比对结果。亚硫酸氢盐转化的DNA的模拟显示,在6%的36 nt读段和3%的70 nt读段中,唯一识别基因组位置的能力下降。>可用性: C语言和Perl中的实用程序的源代码可免费获得可以作为GMAP软件包的一部分下载。>联系方式:

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号