首页> 外文学位 >Assembly algorithms for next-generation sequence data.
【24h】

Assembly algorithms for next-generation sequence data.

机译:下一代序列数据的组装算法。

获取原文
获取原文并翻译 | 示例

摘要

Next-generation sequencing is revolutionizing genomics, promising higher coverage at a lower cost per base when compared to Sanger sequencing. Shorter reads and higher error rates from these new instruments necessitate the development of new algorithms and software. This dissertation describes approaches to tackle some problems related to genome assembly with these short fragments.;We describe YASRA (Yet Another Short Read Assembler), that performs comparative assembly of short reads using a reference genome, which can differ substantially from the genome being sequenced. We explain the algorithm and present the results of assembling one ancient-mitochondrial and one plastid dataset. Comparing the performance of YASRA with the AMOScmp-shortReads and Newbler mapping assemblers (version 2.0.00.17) as template genomes are varied, we find that YASRA generates fewer contigs with higher coverage and fewer errors. We also analyze situations where the use of comparative assembly outperforms de novo assembly, and vice-versa, and compare the performance of YASRA with that of the Velvet (version 0.7.53) and Newbler de novo assemblers (version 2.0.00.17).;We utilize the concept of "overlap-graphs" from YASRA to find genetic differences within a target species. We describe a simple pipeline for deducing such locations of variation in the presence of a reference genome and then extend it to deduce polymorphisms in a species without the help of a reference genome. Our implementation of this algorithm, DIAL (De Novo Identification of Alleles) is described. The method works even when the coverage is insufficient for de novo assembly and can be extended to determine small indels (insertions/deletions). We evaluate the effectiveness of the approach using published Roche/454 sequence data of Dr. James Watson to detect heterozygous locations. We also apply our approach on recent Illumina data from Orangutan, in each case comparing our results to those from computational analysis that used a reference genome sequence.
机译:下一代测序正在彻底改变基因组学,与Sanger测序相比,有望以较低的单位成本获得更高的覆盖率。这些新仪器读取时间短,错误率高,因此有必要开发新的算法和软件。本文描述了解决这些短片段与基因组组装有关的问题的方法。;我们描述了YASRA(又一个短读组装者),它使用参考基因​​组进行短读的比较组装,这可能与被测序的基因组有很大不同。我们解释了该算法,并提出了组装一个古代线粒体和一个质体数据集的结果。将YASRA与AMOScmp-shortReads和Newbler映射装配器(版本2.0.00.17)的性能进行比较,结果发现模板基因组有所不同,我们发现YASRA生成的重叠群更少,覆盖率更高,错误更少。我们还分析了比较装配的使用优于新装配的情况,反之亦然,并比较了YASRA和Velvet(0.7.53版)和Newbler de novo装配工(2.0.0.17版)的性能。我们利用YASRA的“重叠图”概念来发现目标物种内的遗传差异。我们描述了一种简单的流水线,用于推断参考基因组存在下的变异位置,然后将其扩展以推断物种中的多态性而无需参考基因组的帮助。描述了该算法的实现DIAL(等位基因从头识别)。即使当覆盖范围不足以进行从头组装时,该方法也可以使用,并且可以扩展以确定较小的插入/缺失(插入/缺失)。我们使用已发表的James Watson博士的Roche / 454序列数据评估该方法的有效性,以检测杂合子位置。我们还将我们的方法应用于来自猩猩的最新Illumina数据,在每种情况下都将我们的结果与使用参考基因​​组序列的计算分析中的结果进行比较。

著录项

  • 作者

    Ratan, Aakrosh.;

  • 作者单位

    The Pennsylvania State University.;

  • 授予单位 The Pennsylvania State University.;
  • 学科 Biology Bioinformatics.;Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 77 p.
  • 总页数 77
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号