首页> 外文会议>Bioinformatics research and applications >A Metric for Phylogenetic Trees Based on Matching
【24h】

A Metric for Phylogenetic Trees Based on Matching

机译:基于匹配的系统发育树指标

获取原文
获取原文并翻译 | 示例

摘要

Comparing two or more phylogenetic trees is a fundamental task in computational biology. The simplest outcome of such a comparison is a pairwise measure of similarity, dissimilarity, or distance. A large number of such measures have been proposed, but so far all suffer from problems varying from computational cost to lack of robustness; many can be shown to behave unexpectedly under certain plausible inputs. For instance, similarity measures based on maximum agreement are too strict, while measures based on the elimination of rogue taxa work poorly when the proportion of rogue taxa is significant; distance measures based on edit distances under simple tree operations (such as nearest-neighbor interchange or subtree pruning and regrafting) are NP-hard; and the widely used Robinson-Foulds distance is poorly distributed and thus affords little discrimination, while also lacking robustness in the face of very small changes—reattaching a single leaf elsewhere in a tree of any size can instantly maximize the distance. In this paper, we introduce an entirely new pairwise distance measure, based on matching, for phylogenetic trees. We prove that our measure induces a metric on the space of trees, show how to compute it in low polynomial time, verify through statistical testing that it is robust, and finally note that it does not exhibit unexpected behavior under the same inputs that cause problems with other measures. We also illustrate its usefulness in clustering trees, demonstrating significant improvements in the quality of hierarchical clustering as compared to the same collections of trees clustered using the Robinson-Foulds distance.
机译:比较两个或多个系统发育树是计算生物学的基本任务。这种比较的最简单结果是相似性,相异性或距离的成对度量。已经提出了许多这样的措施,但是到目前为止,所有这些措施都存在从计算成本到缺乏鲁棒性的问题。在某些合理的输入下,许多可能表现出意想不到的行为。例如,基于最大协议的相似性措施过于严格,而当消除恶意分类群的比例很大时,基于消除恶意分类群的措施则效果不佳;在简单的树操作(例如最近邻居交换或子树修剪和移植)下基于编辑距离的距离度量是NP难的;并且广泛使用的Robinson-Foulds距离分布不均,因此几乎没有区别,同时在面对很小的变化时也缺乏鲁棒性-将单个叶子重新附着在任何大小的树中的其他地方可以立即使距离最大化。在本文中,我们为系统发育树介绍了一种基于匹配的全新成对距离度量。我们证明了我们的测度在树木空间上产生了一个度量,展示了如何在低多项式时间内对其进行度量,并通过统计测试验证了该度量的鲁棒性,最后注意到在引起问题的相同输入下,该度量不表现出意外行为与其他措施。我们还说明了其在聚类树中的有用性,与使用Robinson-Foulds距离聚类的树的相同集合相比,证明了层次聚类质量的显着提高。

著录项

  • 来源
  • 会议地点 Changsha(CN);Changsha(CN)
  • 作者单位

    Laboratory for Computational Biology and Bioinformatics,Swiss Federal Institute of Technology (EPFL),EPFL-IC-LCBB, INJ 230, Station 14, CH-1015 Lausanne, Switzerland;

    Laboratory for Computational Biology and Bioinformatics,Swiss Federal Institute of Technology (EPFL),EPFL-IC-LCBB, INJ 230, Station 14, CH-1015 Lausanne, Switzerland;

    Laboratory for Computational Biology and Bioinformatics,Swiss Federal Institute of Technology (EPFL),EPFL-IC-LCBB, INJ 230, Station 14, CH-1015 Lausanne, Switzerland;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生物工程学(生物技术);
  • 关键词

  • 入库时间 2022-08-26 14:07:53

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号