首页> 美国卫生研究院文献>other >Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses

【2h】

Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses

机译：加权统计分箱：实现统计一致的基因组规模的系统发育分析

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically consistent under the multi-species coalescent model. New data used in this study are available at DOI: , and the software is available at .

机译：由于生物学过程可能导致具有不同进化历史的不同基因座，因此物种树估计需要来自多个基因组的多个基因座。尽管许多过程可能导致基因树和物种树之间的不和谐，但是由多物种合并建模的不完整谱系排序（ILS）被认为是基因树异质性的主要原因。已经开发出基于联盟的方法来估计物种树，其中许多是通过组合估计的基因树来进行操作的，因此被称为“摘要方法”。因为汇总方法通常很快（并且比共同估计基因树和物种树的更复杂的基于聚结的方法要快得多），所以它们已成为非常流行的用于从多个位点估计物种树的技术。但是，最近的研究已经确定，在存在基因树估计误差的情况下，汇总方法可能会降低准确性，并且许多生物学数据集都具有大量的基因树估计误差，因此汇总方法在生物学上实际的条件下可能不是非常准确。 Mirarab等。（Science 2014）提出了一种“统计分箱”技术，以改善多基因座分析中的基因树估计，并表明它提高了MP-EST（最流行的基于聚结的汇总方法之一）的准确性。统计分箱使用简单的启发式方法来评估“可组合性”，然后使用较大的基因集重新计算基因树，具有良好的经验性能，但是在系统进化管道中使用统计分箱不具有统计上的理想特性。一致的。我们表明，通过bin大小对重新计算的基因树进行加权，使多物种合并下的统计binning在统计上保持一致，并保持了良好的经验性能。因此，“加权统计分箱”实现了高度准确的基因组规模物种树估计，并且在多物种合并模型下在统计上也是一致的。本研究中使用的新数据可在DOI：上获得，而该软件可在上获得。

著录项

期刊名称 other
作者
Md Shamsuzzoha Bayzid; Siavash Mirarab; Bastien Boussau; Tandy Warnow;
展开▼
作者单位

展开▼
年(卷),期 -1(10),6
年度 -1
页码 e0129183
总页数 40
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. The “weighted ensemble” path sampling method is statistically exact for a broad class of stochastic processes and binning procedures [J] . Bin W. Zhang, David Jasnow, Daniel M. Zuckerman Journal of Chemical Physics . 2010,第5期

机译：“加权集合”路径采样方法在统计上适用于广泛的随机过程和分类程序
2. Statistically Consistent k-mer Methods for Phylogenetic Tree Reconstruction [J] . AllmanElizabeth S., RhodesJohn A., SullivantSeth Journal of computational biology . 2017,第2期

机译：统计上一致的k聚体系统进化树方法
3. Authors' response: development and validation of a questionnaire to evaluate infection control in oral radiology-consistent statistical analyses and methodology [J] . da Costa Eliana D., Corrente Jose E., Ambrosano Glaucia M. B. DentoMaxilloFacial Radiology . 2017,第5期

机译：作者的回应：调查和验证调查问卷，以评估口腔放射学 - 一致统计分析和方法中的感染控制
4. Predicting functional gene-links from phylogenetic-statistical analyses of whole genomes [C] . Barker, D., Pagel, . 2005

机译：从整个基因组的系统发育统计分析预测功能性基因链接
5. Association between lung cancer/multiple myeloma mortality and exposure to oncogenic viruses -- statistical analyses using non-model- and model-based statistical methods and various control sampling schemes for cancer mortality in occupational cohorts. [D] . Ndetan, Harrison Tatandam. 2009

机译：肺癌/多发性骨髓瘤死亡率与致癌病毒暴露之间的关联-使用非基于模型和基于模型的统计方法以及各种对照抽样方案对职业人群的癌症死亡率进行统计分析。
6. The weighted ensemble path sampling method is statistically exact for a broad class of stochastic processes and binning procedures [O] . Bin W. Zhang, David Jasnow, Daniel M. Zuckerman -1

机译：加权集合路径采样方法在统计上适用于广泛的随机过程和分类程序
7. Weighted Statistical Binning: enabling statistically consistent genome-scale phylogenetic analyses [O] . Bayzid, Md. Shamsuzzoha, Mirarab, Siavash, Boussau, Bastien, 2015

机译：加权统计分级：使统计上一致基因组规模的系统发育分析

Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅