首页> 美国卫生研究院文献>Evolutionary Bioinformatics Online >Maximum Likelihood Analyses of 3490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling
【2h】

Maximum Likelihood Analyses of 3490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling

机译:3490 rbcL序列的最大似然分析:综合推断与组特定分类群采样的可伸缩性

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The constant accumulation of sequence data poses new computational and methodological challenges for phylogenetic inference, since multiple sequence alignments grow both in the horizontal (number of base pairs, phylogenomic alignments) as well as vertical (number of taxa) dimension. Put aside the ongoing controversial discussion about appropriate models, partitioning schemes, and assembly methods for phylogenomic alignments, coupled with the high computational cost to infer these, for many organismic groups, a sufficient number of taxa is often exclusively available from one or just a few genes (e.g., rbcL, matK, rDNA). In this paper we address scalability of Maximum-Likelihood-based phylogeny reconstruction with respect to the number of taxa by example of several large nested single-gene rbcL alignments comprising 400 up to 3,491 taxa. In order to test the effect of taxon sampling, we employ an appropriately adapted taxon jackknifing approach. In contrast to standard jackknifing, this taxon subsampling procedure is not conducted entirely at random, but based on drawing subsamples from empirical taxon-groups which can either be user-defined or determined by using taxonomic information from databases. Our results indicate that, despite an unfavorable number of sequences to number of base pairs ratio, i.e., many relatively short sequences, Maximum Likelihood tree searches and bootstrap analyses scale well on single-gene rbcL alignments with a dense taxon sampling up to several thousand sequences. Moreover, the newly implemented taxon subsampling procedure can be beneficial for inferring higher level relationships and interpreting bootstrap support from comprehensive analysis.
机译:序列数据的不断积累对系统发育推论提出了新的计算和方法挑战,因为水平(碱基对数,植物学比对)和垂直(分类单元数)都存在多个序列比对。撇开有关系统学比对的合适模型,分区方案和组装方法的持续争议性讨论,再加上计算成本高昂的推断,对于许多生物群而言,通常仅一个或几个就能获得足够数量的分类单元基因(例如,rbcL,matK,rDNA)。在本文中,我们通过几个大型嵌套的单基因rbcL比对(包括400个至3,491个分类单元),解决了基于最大似然的系统发育重建相对于分类单元数量的可扩展性。为了测试分类群采样的效果,我们采用了一种经过适当调整的分类群缩进方法。与标准分类法相反,此分类单元二次抽样过程并非完全随机进行,而是基于从经验分类单元中抽取的子样本,该子样本可以由用户定义,也可以使用数据库中的分类信息确定。我们的结果表明,尽管序列数与碱基对数之比不理想,即许多相对较短的序列,但是最大似然树搜索和自举分析在单基因rbcL比对中具有良好的扩展性,并且具有多达数千个序列的密集分类群。此外,新实施的分类单元二次抽样程序可能有助于推断更高层次的关系并从综合分析中解释引导程序支持。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号