首页> 外文期刊>BMC Bioinformatics >A support vector machine based test for incongruence between sets of trees in tree space
【24h】

A support vector machine based test for incongruence between sets of trees in tree space

机译:基于支持向量机的树空间中树集之间不一致的测试

获取原文
获取外文期刊封面目录资料

摘要

Background The increased use of multi-locus data sets for phylogenetic reconstruction has increased the need to determine whether a set of gene trees significantly deviate from the phylogenetic patterns of other genes. Such unusual gene trees may have been influenced by other evolutionary processes such as selection, gene duplication, or horizontal gene transfer. Results Motivated by this problem we propose a nonparametric goodness-of-fit test for two empirical distributions of gene trees, and we developed the software GeneOut to estimate a p-value for the test. Our approach maps trees into a multi-dimensional vector space and then applies support vector machines (SVMs) to measure the separation between two sets of pre-defined trees. We use a permutation test to assess the significance of the SVM separation. To demonstrate the performance of GeneOut , we applied it to the comparison of gene trees simulated within different species trees across a range of species tree depths. Applied directly to sets of simulated gene trees with large sample sizes, GeneOut was able to detect very small differences between two set of gene trees generated under different species trees. Our statistical test can also include tree reconstruction into its test framework through a variety of phylogenetic optimality criteria. When applied to DNA sequence data simulated from different sets of gene trees, results in the form of receiver operating characteristic (ROC) curves indicated that GeneOut performed well in the detection of differences between sets of trees with different distributions in a multi-dimensional space. Furthermore, it controlled false positive and false negative rates very well, indicating a high degree of accuracy. Conclusions The non-parametric nature of our statistical test provides fast and efficient analyses, and makes it an applicable test for any scenario where evolutionary or other factors can lead to trees with different multi-dimensional distributions. The software GeneOut is freely available under the GNU public license.
机译:背景技术多基因座数据集在系统发育重建中的使用增加,增加了确定一组基因树是否显着偏离其他基因的系统发育模式的需求。这样的异常基因树可能已经受到其他进化过程的影响,例如选择,基因复制或水平基因转移。结果受此问题的影响,我们针对基因树的两个经验分布提出了一种非参数拟合优度检验,并且我们开发了GeneOut软件来估计该检验的p值。我们的方法将树映射到多维向量空间,然后应用支持向量机(SVM)来测量两组预定义树之间的间隔。我们使用置换检验来评估SVM分离的重要性。为了证明GeneOut的性能,我们将其应用于在不同树种深度范围内不同树种内模拟的基因树的比较。直接应用于大型样本的模拟基因树集,GeneOut能够检测在不同物种树下生成的两组基因树之间的很小差异。我们的统计测试还可以通过各种系统发育最优标准将树木重建纳入其测试框架。当将其应用于从不同基因树集模拟的DNA序列数据时,结果以接收器操作特征(ROC)曲线的形式表示,GeneOut在多维空间中具有不同分布的树集之间的差异检测中表现良好。此外,它很好地控制了误报率和误报率,表明准确性很高。结论我们的统计检验的非参数性质提供了快速有效的分析,使其适用于进化或其他因素可能导致树木具有不同多维分布的任何情况。 GeneOut软件可在GNU公共许可下免费获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号