Assembly algorithms for next-generation sequence data.

机译：下一代序列数据的组装算法。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Next-generation sequencing is revolutionizing genomics, promising higher coverage at a lower cost per base when compared to Sanger sequencing. Shorter reads and higher error rates from these new instruments necessitate the development of new algorithms and software. This dissertation describes approaches to tackle some problems related to genome assembly with these short fragments.;We describe YASRA (Yet Another Short Read Assembler), that performs comparative assembly of short reads using a reference genome, which can differ substantially from the genome being sequenced. We explain the algorithm and present the results of assembling one ancient-mitochondrial and one plastid dataset. Comparing the performance of YASRA with the AMOScmp-shortReads and Newbler mapping assemblers (version 2.0.00.17) as template genomes are varied, we find that YASRA generates fewer contigs with higher coverage and fewer errors. We also analyze situations where the use of comparative assembly outperforms de novo assembly, and vice-versa, and compare the performance of YASRA with that of the Velvet (version 0.7.53) and Newbler de novo assemblers (version 2.0.00.17).;We utilize the concept of "overlap-graphs" from YASRA to find genetic differences within a target species. We describe a simple pipeline for deducing such locations of variation in the presence of a reference genome and then extend it to deduce polymorphisms in a species without the help of a reference genome. Our implementation of this algorithm, DIAL (De Novo Identification of Alleles) is described. The method works even when the coverage is insufficient for de novo assembly and can be extended to determine small indels (insertions/deletions). We evaluate the effectiveness of the approach using published Roche/454 sequence data of Dr. James Watson to detect heterozygous locations. We also apply our approach on recent Illumina data from Orangutan, in each case comparing our results to those from computational analysis that used a reference genome sequence.

机译：下一代测序正在彻底改变基因组学，与Sanger测序相比，有望以较低的单位成本获得更高的覆盖率。这些新仪器读取时间短，错误率高，因此有必要开发新的算法和软件。本文描述了解决这些短片段与基因组组装有关的问题的方法。；我们描述了YASRA（又一个短读组装者），它使用参考基因组进行短读的比较组装，这可能与被测序的基因组有很大不同。我们解释了该算法，并提出了组装一个古代线粒体和一个质体数据集的结果。将YASRA与AMOScmp-shortReads和Newbler映射装配器（版本2.0.00.17）的性能进行比较，结果发现模板基因组有所不同，我们发现YASRA生成的重叠群更少，覆盖率更高，错误更少。我们还分析了比较装配的使用优于新装配的情况，反之亦然，并比较了YASRA和Velvet（0.7.53版）和Newbler de novo装配工（2.0.0.17版）的性能。我们利用YASRA的“重叠图”概念来发现目标物种内的遗传差异。我们描述了一种简单的流水线，用于推断参考基因组存在下的变异位置，然后将其扩展以推断物种中的多态性而无需参考基因组的帮助。描述了该算法的实现DIAL（等位基因从头识别）。即使当覆盖范围不足以进行从头组装时，该方法也可以使用，并且可以扩展以确定较小的插入/缺失（插入/缺失）。我们使用已发表的James Watson博士的Roche / 454序列数据评估该方法的有效性，以检测杂合子位置。我们还将我们的方法应用于来自猩猩的最新Illumina数据，在每种情况下都将我们的结果与使用参考基因组序列的计算分析中的结果进行比较。

著录项

作者
Ratan, Aakrosh.;
展开▼
作者单位

The Pennsylvania State University.;

展开▼
授予单位 The Pennsylvania State University.;
学科 Biology Bioinformatics.;Computer Science.
学位 Ph.D.
年度 2009
页码 77 p.
总页数 77
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Assembly algorithms for next-generation sequencing data. [J] . Miller JR, Koren S, Sutton G Genomics . 2010,第6期

机译：下一代测序数据的组装算法。
2. An MCMC algorithm for haplotype assembly from whole-genome sequence data. [J] . Bansal V, Halpern AL, Axelrod N, Genome research . 2008,第8期

机译：用于从全基因组序列数据中进行单倍型装配的MCMC算法。
3. De novo assembly of the complete organelle genome sequences of azuki bean (Vigna angularis) using next-generation sequencers [J] . Ken Naito, Akito Kaga, Norihiko Tomooka, Breeding science . 2013,第2期

机译：使用下一代测序仪从头组装小豆（Vigna angularis）的完整细胞器基因组序列
4. Very fast algorithms for evaluating the stability of ML and Bayesian phylogenetic trees from sequence data. [C] . Waddell PJ, Kishino H, Ota R Workshop on Genome Informatics . 2002

机译：非常快速的算法，用于从序列数据中评估ml和贝叶氏植物的稳定性。
5. An Approach for Analyzing Genomic Sequences by Integrating Assembly-Based and Assembly-Free Algorithms [D] . Dam, Vi Ngoc Tuong. 2018

机译：一种通过整合组装和无组装算法分析基因组序列的方法
6. Assembly Algorithms for Next-Generation Sequencing Data [O] . Jason R. Miller, Sergey Koren, Granger Sutton -1

机译：用于下一代测序数据的组装算法
7. ConPADE: genome assembly ploidy estimation from next-generation sequencing data. [O] . Gabriel R A Margarido, David Heckerman 2015

机译：ConpaDE：来自下一代测序数据的基因组装配倍性估计。

Assembly algorithms for next-generation sequence data.

摘要

著录项

相似文献

相关主题

期刊订阅