首页> 美国卫生研究院文献>other >Consistency and convergence rate of phylogenetic inference via regularization
【2h】

Consistency and convergence rate of phylogenetic inference via regularization

机译:通过正则化的系统发育推断的一致性和收敛速度

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

It is common in phylogenetics to have some, perhaps partial, information about the overall evolutionary tree of a group of organisms and wish to find an evolutionary tree of a specific gene for those organisms. There may not be enough information in the gene sequences alone to accurately reconstruct the correct “gene tree.” Although the gene tree may deviate from the “species tree” due to a variety of genetic processes, in the absence of evidence to the contrary it is parsimonious to assume that they agree. A common statistical approach in these situations is to develop a likelihood penalty to incorporate such additional information. Recent studies using simulation and empirical data suggest that a likelihood penalty quantifying concordance with a species tree can significantly improve the accuracy of gene tree reconstruction compared to using sequence data alone. However, the consistency of such an approach has not yet been established, nor have convergence rates been bounded. Because phylogenetics is a non-standard inference problem, the standard theory does not apply. In this paper, we propose a penalized maximum likelihood estimator for gene tree reconstruction, where the penalty is the square of the Billera-Holmes-Vogtmann geodesic distance from the gene tree to the species tree. We prove that this method is consistent, and derive its convergence rate for estimating the discrete gene tree structure and continuous edge lengths (representing the amount of evolution that has occurred on that branch) simultaneously. We find that the regularized estimator is “adaptive fast converging,” meaning that it can reconstruct all edges of length greater than any given threshold from gene sequences of polynomial length. Our method does not require the species tree to be known exactly; in fact, our asymptotic theory holds for any such guide tree.
机译:在系统发育学中,通常有一些(也许是部分)有关一组生物整体进化树的信息,并希望找到这些生物特定基因的进化树。仅基因序列中可能没有足够的信息来准确地重建正确的“基因树”。尽管由于多种遗传过程,基因树可能会偏离“物种树”,但在没有相反证据的情况下,假定它们同意是同质的。在这些情况下,一种常见的统计方法是发展一种可能性惩罚,以纳入此类附加信息。最近使用模拟和经验数据进行的研究表明,与单独使用序列数据相比,量化与物种树一致性的似然罚分可以显着提高基因树重构的准确性。但是,这种方法的一致性尚未建立,收敛速度也没有限制。由于系统发育学是一个非标准的推理问题,因此标准理论不适用。在本文中,我们提出了一种用于基因树重构的惩罚最大似然估计器,其中惩罚是从基因树到物种树的Billera-Holmes-Vogtmann测地线距离的平方。我们证明了该方法是一致的,并同时推导了其收敛速率,以估计离散的基因树结构和连续的边长(代表该分支上发生的进化量)。我们发现,正规化估计量是“自适应快速收敛”,这意味着它可以从多项式长度的基因序列中重建长度大于任何给定阈值的所有边。我们的方法不需要准确地知道树种;实际上,我们的渐近理论适用于任何此类指导树。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号