首页> 外文会议>Annual International Conference on Research in Computational Molecular Biology >Maximum Likelihood on Four Taxa Phylogenetic Trees: Analytic Solutions
【24h】

Maximum Likelihood on Four Taxa Phylogenetic Trees: Analytic Solutions

机译:四种分类系统的最大可能性:分析解决方案

获取原文

摘要

Maximum likelihood (ML) is increasingly used as an optiniality criterion for selecting evolutionary trees (Felsenstein, 1981), but finding the global optimum is a hard computational task. Because no general analytic solution is known, numeric techniques such as hill climbing or expectation maximization (EM), are used in order to find optimal parameters for a given tree. So far, analytic solutions were derived only for the simplest model - three taxa, two state characters, under a molecular clock (MC). Quoting Ziheng Yang (2000), who initiated the analytic approach, "this seems to be the simplest case, but has many of the conceptual and statistical complexities involved in phylogenetic estimation". In this work, we give analytic solutions for four taxa, two state characters under a molecular clock. The change from three to four taxa incurs a major increase in the complexity of the underlying algebraic system, and requires novel techniques and approaches. We start by presenting the general maximum likelihood problem on phylogenetic trees as a constrained optimization problem, and the resulting system of polynomial equations. In full generality, it is infeasible to solve this system, therefore specialized tools for the MC case are developed. Four taxa rooted trees have two topologies - the fork (two subtrees with two leaves each) and the comb (one subtree with three leaves, the other with a single leaf). We combine the ultrametric properties of MC trees with the Hadamard conjugation (Hendy and Penny, 1993) to derive a number of topology dependent identities. Employing these identities, we substantially simplify the system of polynomial equations. We finally use tools from algebraic geometry (e.g. Grobner bases, ideal saturation, resultants) and employ symbolic algebra software to obtain closed form analytic solutions (expressed parametrically in the input data) for the fork topology, and analytic solutions for the comb. We show that in contrast to the fork, the comb has no closed form solutions (expressed by radicals in the input data). In general, four taxa trees can have multiple ML points (Steel, 1994, Chor et. al, 2001). In contrast, we can now prove that under the MC assumption, both the fork and the comb topologies have a unique (local andglobal) ML point.
机译:最大可能性(ML)越来越多地用作选择进化树的Optiniality标准(Felsenstein,1981),但发现全局最优是一个硬计算任务。因为没有已知一般分析解决方案,所以使用诸如山坡或期望最大化(EM)的数字技术,以便为给定树找到最佳参数。到目前为止,分析解决方案仅用于最简单的模型 - 三个分类群,两个状态字符,在分子时钟(MC)下。引用Ziheng Yang(2000),他发起了分析方法,“这似乎是最简单的情况,但有许多概念和统计复杂性涉及系统发育估计”。在这项工作中,我们为四个分类群进行分析解决方案,分子时钟下的两个状态字符。从三到四个出征的变化发生了潜在代数系统的复杂性的重大增加,需要新颖的技术和方法。我们首先展示系统发育树上的一般最大似然问题作为受限制的优化问题,以及多项式方程的所得系统。在完全一般性中,解决该系统是不可行的,因此开发了MC案例的专业工具。四个分类群生根树木有两个拓扑 - 叉子(两个叶子的两个子树)和梳子(一个带有三个叶子的子树,另一个叶子)。我们将MC树的超空特性与Hadamard共轭(Hendy和Penny,1993)结合起来导出许多拓扑依赖性身份。采用这些身份,我们基本上简化了多项式方程的系统。我们终于使用代数几何(例如Grobner基础,理想饱和度,结果)的工具,并采用符号代数软件来获得叉拓扑的封闭形式的分析解决方案(参数化),以及梳理的分析解决方案。我们表明与叉形相比,梳子没有闭合的形式解决方案(在输入数据中的激进表示)。一般来说,四个分类群树可以有多个ml积分(钢铁,1994,Chor et.al,2001)。相比之下,我们现在可以证明,在MC假设下,叉子和梳子拓扑都具有独特(局部Andglobal)ML点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号