首页> 外文会议>Comparative genomics >Fast and Accurate Phylogenetic Reconstruction from High-Resolution Whole-Genome Data and a Novel Robustness Estimator
【24h】

Fast and Accurate Phylogenetic Reconstruction from High-Resolution Whole-Genome Data and a Novel Robustness Estimator

机译:从高分辨率全基因组数据和新型鲁棒性估计器快速准确地进行系统发育重建

获取原文
获取原文并翻译 | 示例

摘要

The rapid accumulation of whole-genome data has renewed interest in the study of genomic rearrangements. Comparative genomics, evolutionary biology, and cancer research all require models and algorithms to elucidate the mechanisms, history, and consequences of these rearrangements. However, even simple models lead to NP-hard problems, particularly in the area of phylogenetic analysis. Current approaches are limited to small collections of genomes and low-resolution data (typically a few hundred syntenic blocks). Moreover, whereas phylogenetic analyses from sequence data are deemed incomplete unless bootstrapping scores (a measure of confidence) are given for each tree edge, no equivalent to bootstrapping exists for rearrangement-based phylogenetic analysis. We describe a fast and accurate algorithm for rearrangement analysis that scales up, in both time and accuracy, to modern high-resolution genomic data. We also describe a novel approach to estimate the robustness of results—an equivalent to the bootstrapping analysis used in sequence-based phylogenetic reconstruction. We present the results of extensive testing on both simulated and real data showing that our algorithm returns very accurate results, while scaling linearly with the size of the genomes and cubically with their number. We also present extensive experimental results showing that our approach to robustness testing provides excellent estimates of confidence, which, moreover, can be tuned to trade off thresholds between false positives and false negatives. Together, these two novel approaches enable us to attack heretofore intractable problems, such as phylogenetic inference for high-resolution vertebrate genomes, as we demonstrate on a set of six vertebrate genomes with 8,380 syntenic blocks. Availability: a copy of the software is available on demand.
机译:全基因组数据的快速积累已引起人们对基因组重排研究的兴趣。比较基因组学,进化生物学和癌症研究都需要模型和算法来阐明这些重排的机制,历史和后果。但是,即使是简单的模型也会导致NP难题,尤其是在系统发育分析领域。当前的方法仅限于少量的基因组和低分辨率数据(通常是数百个同义块)。此外,尽管除非为每个树边缘给出自举分数(置信度),否则从序列数据进行的系统发育分析被认为是不完整的,但对于基于重排的系统发育分析,不存在等同于自举的方法。我们描述了一种用于重排分析的快速,准确的算法,该算法可以在时间和准确性上扩大到现代高分辨率基因组数据。我们还描述了一种评估结果的鲁棒性的新颖方法,等效于基于序列的系统发育重建中使用的自举分析。我们提供了对模拟和真实数据进行广泛测试的结果,这些结果表明我们的算法返回了非常准确的结果,同时与基因组的大小呈线性比例关系,与基因组的数量呈立方关系。我们还提供了广泛的实验结果,表明我们的鲁棒性测试方法提供了极好的置信度估计,此外,可以调整其以在假阳性和假阴性之间权衡阈值。这两个新颖的方法使我们能够共同解决迄今为止棘手的问题,例如高分辨率脊椎动物基因组的系统发育推断,正​​如我们在一组六个具有8380个同义块的脊椎动物基因组上所证明的那样。可用性:可按需提供软件的副本。

著录项

  • 来源
    《Comparative genomics》|2010年|p.137-148|共12页
  • 会议地点 Ottawa(CA);Ottawa(CA)
  • 作者单位

    Laboratory for Computational Biology and Bioinformatics, EPFL,EPFL-IC-LCBB INJ230, Station 14, CH-1015 Lausanne, Switzerland;

    Laboratory for Computational Biology and Bioinformatics, EPFL,EPFL-IC-LCBB INJ230, Station 14, CH-1015 Lausanne, Switzerland;

    Laboratory for Computational Biology and Bioinformatics, EPFL,EPFL-IC-LCBB INJ230, Station 14, CH-1015 Lausanne, Switzerland;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生物工程学(生物技术);
  • 关键词

  • 入库时间 2022-08-26 14:07:00

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号