...
首页> 外文期刊>BMC Evolutionary Biology >Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model
【24h】

Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model

机译:使用位点异构模型抑制动物系统发育中的长分支吸引伪像

获取原文
   

获取外文期刊封面封底 >>

       

摘要

BackgroundThanks to the large amount of signal contained in genome-wide sequence alignments, phylogenomic analyses are converging towards highly supported trees. However, high statistical support does not imply that the tree is accurate. Systematic errors, such as the Long Branch Attraction (LBA) artefact, can be misleading, in particular when the taxon sampling is poor, or the outgroup is distant. In an otherwise consistent probabilistic framework, systematic errors in genome-wide analyses can be traced back to model mis-specification problems, which suggests that better models of sequence evolution should be devised, that would be more robust to tree reconstruction artefacts, even under the most challenging conditions.MethodsWe focus on a well characterized LBA artefact analyzed in a previous phylogenomic study of the metazoan tree, in which two fast-evolving animal phyla, nematodes and platyhelminths, emerge either at the base of all other Bilateria, or within protostomes, depending on the outgroup. We use this artefactual result as a case study for comparing the robustness of two alternative models: a standard, site-homogeneous model, based on an empirical matrix of amino-acid replacement (WAG), and a site-heterogeneous mixture model (CAT). In parallel, we propose a posterior predictive test, allowing one to measure how well a model acknowledges sequence saturation.ResultsAdopting a Bayesian framework, we show that the LBA artefact observed under WAG disappears when the site-heterogeneous model CAT is used. Using cross-validation, we further demonstrate that CAT has a better statistical fit than WAG on this data set. Finally, using our statistical goodness-of-fit test, we show that CAT, but not WAG, correctly accounts for the overall level of saturation, and that this is due to a better estimation of site-specific amino-acid preferences.ConclusionThe CAT model appears to be more robust than WAG against LBA artefacts, essentially because it correctly anticipates the high probability of convergences and reversions implied by the small effective size of the amino-acid alphabet at each site of the alignment. More generally, our results provide strong evidence that site-specificities in the substitution process need be accounted for in order to obtain more reliable phylogenetic trees.
机译:背景由于全基因组序列比对中包含大量信号,系统生物学分析正朝着高度受支持的树木发展。但是,高度的统计支持并不意味着该树是准确的。系统错误,例如长枝引诱(LBA)伪影,可能会产生误导,尤其是在分类单元采样效果不佳或外围人群较远的情况下。在其他情况下一致的概率框架中,可以将全基因组分析中的系统错误追溯到模型错误指定问题,这表明应该设计更好的序列进化模型,即使在树下重建的情况下,它对树的重建伪像也会更加健壮。方法我们重点研究了在先前的后生动物的系统发育研究中分析的,具有良好特征的LBA伪迹,其中两个快速进化的动物门,线虫和蠕虫在所有其他Bilateria的基础上或在原虫内出现,取决于小组。我们以人工结果为例,比较了两个替代模型的稳健性:基于氨基酸替代(WAG)经验矩阵的标准,均质模型和非均质混合物模型(CAT) 。并行地,我们提出了一种后验预测测试,该测试可以衡量模型对序列饱和的认可程度。结果采用贝叶斯框架,我们表明当使用位点异质模型CAT时,在WAG下观察到的LBA伪影会消失。使用交叉验证,我们进一步证明,在此数据集上,CAT具有比WAG更好的统计拟合。最后,使用我们的统计拟合优度检验,我们表明CAT而不是WAG正确地说明了总饱和度,这是由于更好地估计了特定位点的氨基酸偏好。该模型似乎比WAG更能抵抗LBA伪像,这主要是因为它正确预测了比对中每个位置的氨基酸字母的有效大小都较小,这暗示了收敛和回复的高可能性。更广泛地说,我们的结果提供了有力的证据,即为了获得更可靠的系统发育树,需要考虑置换过程中的位点特异性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号