首页> 美国卫生研究院文献>other >Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses
【2h】

Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses

机译:加权统计分箱:实现统计一致的基因组规模的系统发育分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically consistent under the multi-species coalescent model. New data used in this study are available at DOI: , and the software is available at .
机译:由于生物学过程可能导致具有不同进化历史的不同基因座,因此物种树估计需要来自多个基因组的多个基因座。尽管许多过程可能导致基因树和物种树之间的不和谐,但是由多物种合并建模的不完整谱系排序(ILS)被认为是基因树异质性的主要原因。已经开发出基于联盟的方法来估计物种树,其中许多是通过组合估计的基因树来进行操作的,因此被称为“摘要方法”。因为汇总方法通常很快(并且比共同估计基因树和物种树的更复杂的基于聚结的方法要快得多),所以它们已成为非常流行的用于从多个位点估计物种树的技术。但是,最近的研究已经确定,在存在基因树估计误差的情况下,汇总方法可能会降低准确性,并且许多生物学数据集都具有大量的基因树估计误差,因此汇总方法在生物学上实际的条件下可能不是非常准确。 Mirarab等。 (Science 2014)提出了一种“统计分箱”技术,以改善多基因座分析中的基因树估计,并表明它提高了MP-EST(最流行的基于聚结的汇总方法之一)的准确性。统计分箱使用简单的启发式方法来评估“可组合性”,然后使用较大的基因集重新计算基因树,具有良好的经验性能,但是在系统进化管道中使用统计分箱不具有统计上的理想特性。一致的。我们表明,通过bin大小对重新计算的基因树进行加权,使多物种合并下的统计binning在统计上保持一致,并保持了良好的经验性能。因此,“加权统计分箱”实现了高度准确的基因组规模物种树估计,并且在多物种合并模型下在统计上也是一致的。本研究中使用的新数据可在DOI:上获得,而该软件可在上获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号