首页> 外文期刊>Molecular biology and evolution >On inconsistency of the Neighbor-Joining, least squares, and minimum evolution estimation when substitution processes are incorrectly modeled
【24h】

On inconsistency of the Neighbor-Joining, least squares, and minimum evolution estimation when substitution processes are incorrectly modeled

机译:关于替换过程建模不正确时邻接、最小二乘法和最小进化估计的不一致

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Using analytical methods, we show that under a variety of model misspecifications, Neighbor-Joining, minimum evolution, and least squares estimation procedures are statistically inconsistent. Failure to correctly account for differing rates-across-sites processes, failure to correctly model rate matrix parameters, and failure to adjust for parallel rates-across-sites changes (a rates-across-subtrees process) are all shown to lead to a "long branch attraction" form of inconsistency. In addition, failure to account for rates-across-sites processes is also shown to result in underestimation of evolutionary distances for a wide variety of substitution models, generalizing an earlier analytical result for the Jukes-Cantor model reported in Golding (Mol. Biol. Evol. 1:125-142, 1983) and a similar bias result for the GTR or REV model in Kelly and Rice (1996). Although standard rates-across-sites models can be employed in many of these cases to restore consistency, current models cannot account for other kinds of misspecification. We examine an idealized but biologically relevant case, where parallel changes in rates at sites across subtrees is shown to give rise to inconsistency. This changing rates-across-subtrees type model misspecification cannot be adjusted for with conventional methods or without carefully considering the rate variation in the larger tree. The results are presented for four-taxon trees, but the expectation is that they have implications for larger trees as well. To illustrate this, a simulated 42-taxon example is given in which the microsporidia, an enigmatic group of eukaryotes, are incorrectly placed at the archaebacteria-eukaryotes split because of incorrectly specified pairwise distances. The analytical nature of the results lend insight into the reasons that long branch attraction tends to be a common form of inconsistency and reasons that other forms of inconsistency like "long branches repel" can arise in some settings. In many of the cases of inconsistency presented, a particular incorrect topology is estimated with probability converging to one, the implication being that measures of uncertainty like bootstrap support will be unable to detect that there is a problem with the estimation. The focus is on distance methods, but previous simulation results suggest that the zones of inconsistency for distance methods contain the zones of inconsistency for maximum likelihood methods as well.
机译:使用分析方法,我们发现,在各种模型错误规范下,邻域连接、最小进化和最小二乘估计过程在统计上是不一致的。未能正确考虑不同站点之间的速率过程,未能正确建模速率矩阵参数,以及未能针对跨站点的并行速率变化进行调整(跨子树速率过程)都会导致“长分支吸引”形式的不一致。此外,未能考虑跨站点速率过程也被证明会导致低估各种替代模型的进化距离,从而概括了Golding(Mol.Biol.Evol.1:125-142,1983)中报告的Jukes-Cantor模型的早期分析结果,以及Kelly和Rice(1996)中GTR或REV模型的类似偏差结果。尽管在许多情况下可以使用跨站点的标准费率模型来恢复一致性,但当前的模型无法解释其他类型的错误规范。我们研究了一个理想化但与生物学相关的案例,其中亚树中位点的速率的平行变化被证明会导致不一致。这种跨子树变化的速率类型模型错误规范无法用传统方法进行调整,或者如果不仔细考虑较大树中的速率变化。结果是针对四分类树提出的,但预计它们也会对较大的树产生影响。为了说明这一点,给出了一个模拟的 42 个分类单元示例,其中微孢子虫(一个神秘的真核生物群)由于错误地指定成对距离而被错误地放置在古细菌-真核生物分裂处。结果的分析性质有助于深入了解长枝吸引往往是不一致的常见形式的原因,以及在某些情况下可能出现其他形式的不一致(如“长枝排斥”)的原因。在许多不一致的情况下,对特定不正确的拓扑进行估计,概率收敛为 1,这意味着不确定性度量(如 bootstrap 支持)将无法检测到估计存在问题。重点是距离方法,但先前的仿真结果表明,距离方法的不一致区域也包含最大似然方法的不一致区域。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号