首页> 美国卫生研究院文献>other >Evaluation of phylogenetic reconstruction methods using bacterial whole genomes: a simulation based study
【2h】

Evaluation of phylogenetic reconstruction methods using bacterial whole genomes: a simulation based study

机译:利用细菌全基因组评估系统发育重建方法:基于模拟的研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Background: Phylogenetic reconstruction is a necessary first step in many analyses which use whole genome sequence data from bacterial populations. There are many available methods to infer phylogenies, and these have various advantages and disadvantages, but few unbiased comparisons of the range of approaches have been made. >Methods: We simulated data from a defined 'true tree' using a realistic evolutionary model. We  built phylogenies from this data using a range of methods, and compared reconstructed trees to the true tree using two measures, noting the computational time needed for different phylogenetic reconstructions. We also used real data from Streptococcus pneumoniae alignments to compare individual core gene trees to a core genome tree. >Results: We found that, as expected, maximum likelihood trees from good quality alignments were the most accurate, but also the most computationally intensive. Using less accurate phylogenetic reconstruction methods, we were able to obtain results of comparable accuracy; we found that approximate results can rapidly be obtained using genetic distance based methods. In real data we found that highly conserved core genes, such as those involved in translation, gave an inaccurate tree topology, whereas genes involved in recombination events gave inaccurate branch lengths. We also show a tree-of-trees, relating the results of different phylogenetic reconstructions to each other. >Conclusions: We recommend three approaches, depending on requirements for accuracy and computational time. For the most accurate tree, use of either RAxML or IQ-TREE with an alignment of variable sites produced by mapping to a reference genome is best. Quicker approaches that do not perform full maximum likelihood optimisation may be useful for many analyses requiring a phylogeny, as generating a high quality input alignment is likely to be the major limiting factor of accurate tree topology.  We have publicly released our simulated data and code to enable further comparisons.
机译:>背景:系统发育重建是许多分析的必需的第一步,这些分析使用了来自细菌种群的全基因组序列数据。有许多可用的推断系统发育的方法,这些方法有各种优点和缺点,但是很少有人对方法范围进行无偏比较。 >方法:我们使用逼真的进化模型从定义的“真实树”中模拟了数据。我们使用多种方法从此数据构建了系统发育树,并使用两种方法将重建的树与真实的树进行了比较,并指出了不同的系统发育重建所需的计算时间。我们还使用了来自肺炎链球菌比对的真实数据来将单个核心基因树与核心基因组树进行比较。 >结果:我们发现,正如预期的那样,来自高质量比对的最大似然树是最准确的,但也是计算量最大的。使用不太精确的系统发育重建方法,我们能够获得可比精度的结果。我们发现可以使用基于遗传距离的方法快速获得近似结果。在真实数据中,我们发现高度保守的核心基因(例如涉及翻译的核心基因)给出的树形拓扑不准确,而参与重组事件的基因给出的分支长度不准确。我们还显示了一棵树,将不同系统发育重建的结果相互关联。 >结论:我们建议三种方法,具体取决于对准确性和计算时间的要求。对于最精确的树,最好将RAxML或IQ-TREE与通过映射到参考基因组产生的可变位点对齐使用。对于需要系统发育的许多分析,不执行完全最大似然优化的较快方法可能很有用,因为生成高质量的输入对齐可能是精确树形拓扑结构的主要限制因素。我们已公开发布了模拟数据和代码,以进行进一步的比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号