首页> 外文期刊>Genome Biology >Improved variant discovery through localre-alignment of short-read next-generationsequencing data using SRMA
【24h】

Improved variant discovery through localre-alignment of short-read next-generationsequencing data using SRMA

机译:通过使用SRMA对短读的下一代测序数据进行本地重新对齐来改进变体发现

获取原文
获取原文并翻译 | 示例
       

摘要

A primary component of next-generation sequencing analysis is to align short reads to a reference genome, witheach read aligned independently. However, reads that observe the same non-reference DNA sequence are highlycorrelated and can be used to better model the true variation in the target genome. A novel short-read micro realigner,SRMA, that leverages this correlation to better resolve a consensus of the underlying DNA sequence of thetargeted genome is described here. Whole-genome human re-sequencing is now feasibleusing next generation sequencing technology. Technologiessuch as those produced by Illumina, Life, andRoche 454 produce millions to billions of short DNAsequences that can be used to reconstruct the diploidsequence of a human genome. Ideally, such data alonecould be used to de novo assemble the genome in question[1-6]. However, the short read lengths (25 to 125bases), the size and repetitive nature of the human genome(3.2 × 109 bases), as well as the modest error rates(approximately 1% per base) make such de novoassembly of mammalian genomes intractable. Instead,short-read sequence alignment algorithms have beendeveloped to compare each short sequence to a referencegenome [7-12]. Observing multiple reads that differsimilarly from the reference sequence in their respectivealignments identifies variants. These alignment algorithmshave made it possible to accurately and efficientlycatalogue many types of variation between human individualsand those causative for specific diseases.
机译:下一代测序分析的主要组成部分是使短读与参考基因组对齐,每个读独立进行对齐。但是,观察到相同非参考DNA序列的读段是高度相关的,可用于更好地模拟目标基因组中的真实变异。本文描述了一种新颖的短读微重组器,SRMA,利用这种相关性更好地解析了目标基因组的基础DNA序列的一致性。现在,使用下一代测序技术可以进行全基因组人类重测序。诸如Illumina,Life和Roche 454生产的技术可产生数百万至数十亿个短DNA序列,可用于重建人类基因组的二倍体序列。理想情况下,仅这些数据就可用于重新组装所讨论的基因组[1-6]。但是,短的读取长度(25至125个碱基),人类基因组的大小和重复性(3.2×109个碱基)以及适度的错误率(每个碱基大约1%)使得这种从头开始的哺乳动物基因组组装变得难以处理。取而代之的是,已经开发了短读序列比对算法,以将每个短序列与参考基因组进行比较[7-12]。观察与参考序列在各自比对中不同的多个读数,鉴定出变体。这些比对算法使人们有可能准确有效地对人类个体与引起特定疾病的个体之间的多种变异进行分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号