首页> 外文期刊>Journal of Mathematical Biology >Information geometry for phylogenetic trees

Information geometry for phylogenetic trees


获取原文并翻译 | 示例


We propose a new space of phylogenetic trees which we call wald space. The motivation is to develop a space suitable for statistical analysis of phylogenies, but with a geometry based on more biologically principled assumptions than existing spaces: in wald space, trees are close if they induce similar distributions on genetic sequence data. As a point set, wald space contains the previously developed Billera-Holmes-Vogtmann (BHV) tree space; it also contains disconnected forests, like the edge-product (EP) space but without certain singularities of the EP space. We investigate two related geometries on wald space. The first is the geometry of the Fisher information metric of character distributions induced by the two-state symmetric Markov substitution process on each tree. Infinitesimally, the metric is proportional to the Kullback-Leibler divergence, or equivalently, as we show, to any f-divergence. The second geometry is obtained analogously but using a related continuous-valued Gaussian process on each tree, and it can be viewed as the trace metric of the affine-invariant metric for covariance matrices. We derive a gradient descent algorithm to project from the ambient space of covariance matrices to wald space. For both geometries we derive computational methods to compute geodesics in polynomial time and show numerically that the two information geometries (discrete and continuous) are very similar. In particular, geodesics are approximated extrinsically. Comparison with the BHV geometry shows that our canonical and biologically motivated space is substantially different.
机译:我们提出了一个新的系统发育树空间,我们称之为瓦尔德空间。其动机是开发一个适合于系统发育统计分析的空间,但其几何结构基于比现有空间更具生物学原理的假设:在瓦尔德空间中,如果树木在遗传序列数据上诱导相似的分布,那么它们是相近的。作为一个点集,wald空间包含以前开发的Billera Holmes-Vogtmann(BHV)树空间;它还包含不连通的森林,比如边积(EP)空间,但没有EP空间的某些奇点。我们研究了wald空间上的两个相关几何。第一个是由每棵树上的两状态对称马尔可夫替换过程产生的特征分布的Fisher信息度量的几何结构。无穷小的情况下,度量与库尔贝克-莱布勒散度成正比,或者,正如我们所展示的,等价于任何f-散度。第二个几何体类似地获得,但在每棵树上使用相关的连续值高斯过程,它可以被视为协方差矩阵的仿射不变度量的跟踪度量。我们推导了一个梯度下降算法,将协方差矩阵的环境空间投影到wald空间。对于这两种几何,我们推导了在多项式时间内计算测地线的计算方法,并在数值上表明这两种信息几何(离散和连续)非常相似。特别是,测地线是外部近似的。与BHV几何结构的比较表明,我们的规范空间和生物动机空间有很大不同。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号