...
首页> 外文期刊>BMC Evolutionary Biology >Comparing species tree estimation with large anchored phylogenomic and small Sanger-sequenced molecular datasets: an empirical study on Malagasy pseudoxyrhophiine snakes
【24h】

Comparing species tree estimation with large anchored phylogenomic and small Sanger-sequenced molecular datasets: an empirical study on Malagasy pseudoxyrhophiine snakes

机译:比较物种树估计与大型锚定植物学和小型Sanger测序分子数据集:马达加斯加假木犀蛇的实证研究

获取原文
           

摘要

Background Using molecular data generated by high throughput next generation sequencing (NGS) platforms to infer phylogeny is becoming common as costs go down and the ability to capture loci from across the genome goes up. While there is a general consensus that greater numbers of independent loci should result in more robust phylogenetic estimates, few studies have compared phylogenies resulting from smaller datasets for commonly used genetic markers with the large datasets captured using NGS. Here, we determine how a 5-locus Sanger dataset compares with a 377-locus anchored genomics dataset for understanding the evolutionary history of the pseudoxyrhophiine snake radiation centered in Madagascar. The Pseudoxyrhophiinae comprise ~86 % of Madagascar’s serpent diversity, yet they are poorly known with respect to ecology, behavior, and systematics. Using the 377-locus NGS dataset and the summary statistics species-tree methods STAR and MP-EST, we estimated a well-supported species tree that provides new insights concerning intergeneric relationships for the pseudoxyrhophiines. We also compared how these and other methods performed with respect to estimating tree topology using datasets with varying numbers of loci. Methods Using Sanger sequencing and an anchored phylogenomics approach, we sequenced datasets comprised of 5 and 377 loci, respectively, for 23 pseudoxyrhophiine taxa. For each dataset, we estimated phylogenies using both gene-tree (concatenation) and species-tree (STAR, MP-EST) approaches. We determined the similarity of resulting tree topologies from the different datasets using Robinson-Foulds distances. In addition, we examined how subsets of these data performed compared to the complete Sanger and anchored datasets for phylogenetic accuracy using the same tree inference methodologies, as well as the program *BEAST to determine if a full coalescent model for species tree estimation could generate robust results with fewer loci compared to the summary statistics species tree approaches. We also examined the individual gene trees in comparison to the 377-locus species tree using the program MetaTree. Results Using the full anchored dataset under a variety of methods gave us the same, well-supported phylogeny for pseudoxyrhophiines. The African pseudoxyrhophiine Duberria is the sister taxon to the Malagasy pseudoxyrhophiines genera, providing evidence for a monophyletic radiation in Madagascar. In addition, within Madagascar, the two major clades inferred correspond largely to the aglyphous and opisthoglyphous genera, suggesting that feeding specializations associated with tooth venom delivery may have played a major role in the early diversification of this radiation. The comparison of tree topologies from the concatenated and species-tree methods using different datasets indicated the 5-locus dataset cannot beused to infer a correct phylogeny for the pseudoxyrhophiines under any method tested here and that summary statistics methods require 50 or more loci to consistently recover the species-tree inferred using the complete anchored dataset. However, as few as 15 loci may infer the correct topology when using the full coalescent species tree method *BEAST. MetaTree analyses of each gene tree from the Sanger and anchored datasets found that none of the individual gene trees matched the 377-locus species tree, and that no gene trees were identical with respect to topology. Conclusions Our results suggest that ≥50 loci may be necessary to confidently infer phylogenies when using summaryspecies-tree methods, but that the coalescent-based method *BEAST consistently recovers the same topology using only 15 loci. These results reinforce that datasets with small numbers of markers may result in misleading topologies, and further, that the method of inference used to generate a phylogeny also has a major influence on the number of loci necessary to infer robust species trees.
机译:背景技术随着成本的下降以及从整个基因组中捕获基因座的能力的增强,使用高通量下一代测序(NGS)平台生成的分子数据来推断系统发育已变得很普遍。尽管有一个普遍共识,即独立位点的数量更多,应该会导致更可靠的系统发育估计,但很少有研究将比较常用遗传标记的较小数据集与使用NGS捕获的大型数据集所产生的系统发育进行了比较。在这里,我们确定5位Sanger数据集与377位锚定基因组数据集的比较方式,以了解以马达加斯加为中心的伪木卫四蛇辐射的演化历史。假单胞菌约占马达加斯加蛇多样性的86%,但在生态学,行为和系统学方面知之甚少。使用377个位点的NGS数据集以及摘要统计树-树方法STAR和MP-EST,我们估算了一个得到良好支持的树种,该树提供了有关假木犀碱属间关系的新见解。我们还比较了这些方法和其他方法相对于使用具有不同基因座数量的数据集估算树形拓扑结构的效果。方法使用Sanger测序和锚定的系统发育组学方法,我们对23个假木犀碱类群分别由5个和377个基因座组成的数据集进行了测序。对于每个数据集,我们都使用基因树(串联)和物种树(STAR,MP-EST)方法估计了系统发育。我们使用Robinson-Foulds距离确定了来自不同数据集的结果树拓扑的相似性。此外,我们使用相同的树推理方法以及程序* BEAST来检查与完整的Sanger和锚定数据集相比,这些数据的子集与完整的Sanger和锚定数据集相比如何表现,以及程序* BEAST以确定物种树估计的完整合并模型是否可以生成鲁棒与摘要统计物种树方法相比,具有较少基因座的结果。我们还使用MetaTree程序检查了与377个基因座物种树相比的单个基因树。结果在各种方法下使用完整的锚定数据集,我们得到了相同的,得到充分支持的假木犀碱系统发育系统。非洲假木犀属杜贝里亚是马达加斯加假木犀属的姊妹分类,为马达加斯加的单系辐射提供了证据。此外,在马达加斯加,推测出的两个主要进化枝大致对应于蛇纹和蛇纹属,这表明与牙齿毒液输送有关的喂养专业可能在这种辐射的早期多样化中起了重要作用。比较使用不同数据集的级联和物种树方法的树形拓扑结构表明,在此处测试的任何方法下,不能使用5位基因组数据来推断假木犀的正确系统发育,摘要统计方法需要50个或更多个基因座才能持续恢复使用完整的锚定数据集推断出的树种。但是,当使用完整合并物种树方法* BEAST时,只有15个基因座可以推断出正确的拓扑。对来自Sanger和锚定数据集的每个基因树的MetaTree分析发现,没有单个基因树与377个基因座物种树匹配,并且就拓扑而言,没有任何基因树是相同的。结论我们的结果表明,使用摘要树种方法可靠地推断系统发育可能需要≥50个基因座,但是基于聚结的* BEAST方法仅使用15个基因座就能一致地恢复相同的拓扑。这些结果加强了具有少量标记的数据集可能导致误导的拓扑,此外,用于生成系统发育的推理方法也对推断稳固物种树所需的基因座数量产生了重大影响。

著录项

相似文献

  • 外文文献
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号