首页> 外文期刊>BMC Genomics >TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees
【24h】

TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees

机译:TreeShrink:快速,准确地检测系统发育树集合中的异常长枝

获取原文
           

摘要

Sequence data used in reconstructing phylogenetic trees may include various sources of error. Typically errors are detected at the sequence level, but when missed, the erroneous sequences often appear as unexpectedly long branches in the inferred phylogeny. We propose an automatic method to detect such errors. We build a phylogeny including all the data then detect sequences that artificially inflate the tree diameter. We formulate an optimization problem, called the k-shrink problem, that seeks to find k leaves that could be removed to maximally reduce the tree diameter. We present an algorithm to find the exact solution for this problem in polynomial time. We then use several statistical tests to find outlier species that have an unexpectedly high impact on the tree diameter. These tests can use a single tree or a set of related gene trees and can also adjust to species-specific patterns of branch length. The resulting method is called TreeShrink. We test our method on six phylogenomic biological datasets and an HIV dataset and show that the method successfully detects and removes long branches. TreeShrink removes sequences more conservatively than rogue taxon removal and often reduces gene tree discordance more than rogue taxon removal once the amount of filtering is controlled. TreeShrink is an effective method for detecting sequences that lead to unrealistically long branch lengths in phylogenetic trees. The tool is publicly available at https://github.com/uym2/TreeShrink .
机译:用于重建系统树的序列数据可能包括各种错误来源。通常,在序列级别检测到错误,但是如果遗漏了错误序列,则在推断的系统发育中通常会出现意想不到的长分支。我们提出了一种自动方法来检测此类错误。我们建立包括所有数据的系统发育特征,然后检测人为地增加树木直径的序列。我们制定了一个优化问题,称为k收缩问题,该问题试图找到可以去除的k个叶子以最大程度地减小树的直径。我们提出了一种算法,可以在多项式时间内找到该问题的精确解。然后,我们使用几个统计检验来发现对树径具有意外高影响的异常树种。这些测试可以使用一棵树或一组相关的基因树,并且还可以调整以适应特定物种的分支长度模式。生成的方法称为TreeShrink。我们在六个植物生物学生物学数据集和一个HIV数据集上测试了我们的方法,结果表明该方法成功地检测并去除了长枝。 TreeShrink去除序列比无赖分类单元更保守,并且一旦控制了过滤量,与无赖分类单元相比,消除基因树不一致的程度通常更大。 TreeShrink是一种有效的方法,可检测导致系统发育树中不切实际的长分支长度的序列。该工具可从https://github.com/uym2/TreeShrink公开获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号