首页> 外文学位 >Phylogeny and Ancestral Genome Reconstruction from Gene Order using Maximum Likelihood and Binary Encoding.
【24h】

Phylogeny and Ancestral Genome Reconstruction from Gene Order using Maximum Likelihood and Binary Encoding.

机译:使用最大似然和二进制编码从基因顺序重建系统发育和祖先基因组。

获取原文
获取原文并翻译 | 示例

摘要

Over the long history of genome evolution, genes get rearranged under events such as rearrangements, losses, insertions and duplications, which in all change the ordering and content along the genome. Recent progress in genome-scale sequencing renews the challenges in the reconstructions of phylogeny and ancestral genomes with gene-order data. Such problems have been proved so interesting that a large number of algorithms have been developed rigorously over the past few years in attempts to tackle these problems following various principles. However, difficulties and limitations in performance and scalability largely prevent us from analyzing emerging modern whole-genome data, our study presented in this dissertation focuses on developing appropriate evolutionary models and robust algorithms for solving the phylogenetic and ancestral inference problems using gene-order data under the whole-genome evolution, along with their applications.;To reconstruct phylogenies from gene-order data, we developed a collection of closely-related methods following the principle of likelihood maximization. To the best of our knowledge, it was the first successful attempt to apply maximum likelihood optimization technique into the analysis of gene-order phylogenetic problem. Later we proposed MLWD (in collaboration with Lin and Moret) in which we described an effective transition model to account for the transitions between presence and absence states of an gene adjacency. Besides genome rearrangements, other evolutionary events modify gene contents such as gene duplications and gene insertion/deletion (indels) can be naturally processed as well. We present our results from extensive testing on simulated data showing that our approach returns very accurate results very quickly.;With a known phylogeny, a subsequent problem is to reconstruct the gene-order of ancestral genomes from their living descendants. To solve this problem, we adopted an adjacency-based probabilistic framework, and developed a method called PMAG. PMAG decomposes gene orderings into a set of gene adjacencies and then infers the probability of observing each adjacency in the ancestral genome. We conducted extensive simulation experiments and compared PMAG with InferCarsPro, GASTS, GapAdj and SCJ . According to the results, PMAG demonstrated great performance in terms of the true positive rate of gene adjacency. PMAG also achieved comparable running time to the other methods, even when the traveling sales man problem (TSP) were exactly solved.;Although PMAG can give good performance, it is strongly restricted from analyzing datasets underwent only rearrangements. To infer ancestral genomes under a more general model of evolution with an arbitrary rate of indels , we proposed an enhanced method PMAG+ based on PMAG. PMAG+ includes a novel approach to infer ancestral gene contents and a detail description to reduce the adjacency assembly problem to an instance of TSP. We designed a series of experiments to validate PMAG+ and compared the results with the most recent and comparable method GapAdj. According to the results, ancestral gene contents predicted by PMAG+ coincided highly with the actual contents with error rates less than 1%. Under various degrees of indels, PMAG+ consistently achieved more accurate prediction of ancestral gene orders and at the same time, produced contigs very close to the actual chromosomes.
机译:在基因组进化的悠久历史中,基因会在诸如重排,丢失,插入和重复等事件下发生重排,这些事件都会改变基因组的顺序和内容。基因组规模测序的最新进展更新了利用基因顺序数据重建系统发育和祖先基因组的挑战。事实证明,这样的问题非常有趣,以至于在过去几年中严格开发了大量算法,以尝试遵循各种原理来解决这些问题。然而,性能和可扩展性方面的困难和局限性使我们无法分析新兴的现代全基因组数据,本文的研究重点在于开发适当的进化模型和鲁棒的算法,以解决基于基因顺序数据的系统发生和祖先推断问题。为了从基因顺序数据重建系统发育,我们遵循似然最大化的原理开发了一系列紧密相关的方法。据我们所知,这是将最大似然优化技术应用于基因顺序系统发育问题分析的首次成功尝试。后来,我们提出了MLWD(与Lin和Moret合作),其中我们描述了一种有效的转移模型,以说明基因邻接的存在状态和不存在状态之间的转换。除基因组重排外,其他进化事件也会修饰基因含量,例如基因重复和基因插入/缺失(indels),也可以自然加工。我们通过对模拟数据的大量测试展示了我们的结果,这些结果表明我们的方法可以非常快速地返回非常准确的结果。;具有已知的系统发育史,随后的问题是从其后代重建祖先基因组的基因顺序。为了解决这个问题,我们采用了基于邻接的概率框架,并开发了一种称为PMAG的方法。 PMAG将基因有序分解为一组基因邻接,然后推断观察祖先基因组中每个邻接的可能性。我们进行了广泛的仿真实验,并将PMAG与InferCarsPro,GASTS,GapAdj和SCJ进行了比较。根据结果​​,PMAG在基因邻接的真正阳性率方面表现出了出色的性能。即使精确解决了旅行商问题(TSP),PMAG的运行时间也可与其他方法相提并论。尽管PMAG可以提供良好的性能,但严格限制了分析仅经过重新排列的数据集。为了在更一般的进化模型下以任意比率的插入缺失来推断祖先的基因组,我们提出了一种基于PMAG的增强方法PMAG +。 PMAG +包括一种新颖的方法来推断祖先的基因含量,并提供了详细的描述以将邻接装配问题减少到TSP实例。我们设计了一系列实验来验证PMAG +,并将结果与​​最新的可比方法GapAdj进行比较。根据结果​​,PMAG +预测的祖先基因含量与实际含量高度吻合,错误率小于1%。在不同程度的插入缺失下,PMAG +始终能更准确地预测祖先的基因顺序,同时产生的重叠群非常接近实际染色体。

著录项

  • 作者

    Hu, Fei.;

  • 作者单位

    University of South Carolina.;

  • 授予单位 University of South Carolina.;
  • 学科 Computer science.;Bioinformatics.;Genetics.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 83 p.
  • 总页数 83
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号