首页> 外文期刊>Journal of Molecular Evolution >When being 'most likely' is not enough: Examining the performance of three uses of the parametric bootstrap in phylogenetics
【24h】

When being 'most likely' is not enough: Examining the performance of three uses of the parametric bootstrap in phylogenetics

机译:如果“最有可能”还不够:在系统发育学中检查参数化引导程序的三种用法的性能

获取原文
获取原文并翻译 | 示例
           

摘要

I show that three parametric-bootstrap (PB) applications that have been proposed for phylogenetic analysis, can be misleading as currently implemented. First, I show that simulating a topology estimated from preliminary data in order to determine the sequence length that should allow the best tree obtained from more extensive data to be correct with a desired probability, delivers an accurate estimate of this length only in topological situations in which most preliminary trees are expected to be both correct and statistically significant, i.e. when no further analysis would be needed. Otherwise, one obtains strong underestimates of the length or similarly biased values for incorrect trees. Second, I show that PB-based topology tests that use as null hypothesis the most likely tree congruent with a pre-specified topological relationship alternative to the unconstrained most likely tree, and simulate this tree for P value estimation, produce excessive type I error (from 50% to 600% and higher) when they are applied to null data generated by star-shaped or dichotomous four-taxon topologies. Simulating the most likely star topology for P value estimation results instead in correct type-I-error production even when the null data are generated by a dichotomous topology. This is a strong indication that the star topology is the correct default null hypothesis for phylogenies. Third, I show that PB-estimated confidence intervals (CIs) for the length of a tree branch are generally accurate, although in some situations they can be strongly over- or under-estimated relative to the "true" CI. Attempts to identify a biased CI through a further round of simulations were unsuccessful. Tracing the origin and propagation of parameter estimate error through the CI estimation exercise, showed that the sparseness of site-patterns which are crucial to the estimation of pivotal parameters, can allow homoplasy to bias these estimates and ultimately the PB-based CI estimation. Concluding, I stress that statistical techniques that simulate models estimated from limited data need to be carefully calibrated, and I defend the point that pattern-sparseness assessment will be the next frontier in the statistical analysis of phylogenies, an effort that will require taking advantage of the merits of black-box maximum-likelihood approaches and of insights from intuitive, site-pattern-oriented approaches like parsimony.
机译:我表明,已提出的用于系统发育分析的三个参数引导(PB)应用程序可能会误导当前的实施。首先,我展示了模拟从初步数据估计的拓扑以便确定序列长度,该序列长度应允许以期望的概率从更广泛的数据中获得的最佳树是正确的,仅在以下拓扑情况下才提供此长度的准确估计预期最初步的树既正确又具有统计意义,即不需要进一步分析时。否则,对于不正确的树木,人们会强烈低估长度或类似的偏向值。其次,我展示了基于PB的拓扑测试,该测试将最可能的树与无约束的最可能树替代的预先指定的拓扑关系用作零假设,并模拟该树进行P值估计,产生过多的I型错误( (从50%到600%或更高))将其应用于星型或二分四类分类拓扑生成的空数据。模拟最可能的星形拓扑以进行P值估计,即使在通过二分拓扑生成空数据时也可以产生正确的I型错误。这有力地表明,星形拓扑是系统发育的正确的默认零假设。第三,我证明了针对树枝长度的PB估计置信区间(CI)通常是准确的,尽管在某些情况下,相对于“真实” CI而言,它们可能被严重高估或低估。尝试通过进一步的模拟来识别有偏倚的CI失败。通过CI估计活动跟踪参数估计误差的起源和传播,表明对关键参数的估计至关重要的位点模式稀疏性可以使同质性对这些估计值产生偏差,并最终使基于PB的CI估计值产生偏差。最后,我强调模拟技术(从有限的数据中估算出的模型)需要仔细校准,并且我坚持认为模式稀疏性评估将是系统发育统计分析的下一个前沿,这一工作需要利用黑盒最大似然法的优点以及从直观,面向站点模式的方法(如简约)获得的见解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号