首页> 外文期刊>Systematic Biology >Misconceptions on Missing Data in RAD-seq Phylogenetics with a Deep-scale Example from Flowering Plants
【24h】

Misconceptions on Missing Data in RAD-seq Phylogenetics with a Deep-scale Example from Flowering Plants

机译:关于RAD-seq系统发育学中缺失数据的误解,以及来自开花植物的深尺度示例

获取原文
获取原文并翻译 | 示例

摘要

Restriction-site associated DNA (RAD) sequencing and related methods rely on the conservation of enzyme recognition sites to isolate homologous DNA fragments for sequencing, with the consequence that mutations disrupting these sites lead to missing information. There is thus a clear expectation for how missing data should be distributed, with fewer loci recovered between more distantly related samples. This observation has led to a related expectation: that RAD-seq data are insufficiently informative for resolving deeper scale phylogenetic relationships. Here we investigate the relationship between missing information among samples at the tips of a tree and information at edges within it. We re-analyze and review the distribution of missing data across ten RAD-seq data sets and carry out simulations to determine expected patterns of missing information. We also present new empirical results for the angiosperm clade Viburnum (Adoxaceae, with a crown age > 50 Ma) for which we examine phylogenetic information at different depths in the tree and with varied sequencing effort. The total number of loci, the proportion that are shared, and phylogenetic informativeness varied dramatically across the examined RAD-seq data sets. Insufficient or uneven sequencing coverage accounted for similar proportions of missing data as dropout from mutation-disruption. Simulations reveal that mutation disruption, which results in phylogenetically distributed missing data, can be distinguished from the more stochastic patterns of missing data caused by lowsequencing coverage. In Viburnum, doubling sequencing coverage nearly doubled the number of parsimony informative sites, and increased by > 10X the number of loci with data shared across > 40 taxa. Our analysis leads to a set of practical recommendations for maximizing phylogenetic information in RAD-seq studies.
机译:限制性位点相关 DNA (RAD) 测序和相关方法依赖于酶识别位点的守恒来分离同源 DNA 片段进行测序,其结果是破坏这些位点的突变会导致信息缺失。因此,对于缺失数据应该如何分布有一个明确的期望,在更远的相关样本之间恢复的位点更少。这一观察结果导致了一个相关的预期:RAD-seq数据对于解决更深层次的系统发育关系没有足够的信息。在这里,我们研究了树尖样本之间的缺失信息与树内边缘信息之间的关系。我们重新分析和审查了十个RAD-seq数据集中缺失数据的分布,并进行了模拟以确定缺失信息的预期模式。我们还提出了被子植物分支荚蒾(Adoxaceae,冠龄>50马)的新实证结果,我们检查了树中不同深度的系统发育信息,并进行了不同的测序工作。在所研究的RAD-seq数据集中,位点总数、共享比例和系统发育信息量差异很大。测序覆盖率不足或不均匀导致的数据缺失比例与突变破坏导致的缺失比例相似。模拟表明,导致系统发育缺失数据的突变破坏可以与低测序覆盖率引起的更随机的缺失数据模式区分开来。在荚蒾中,测序覆盖率翻了一番,使精简信息位点的数量几乎翻了一番,并将>40个分类群中共享数据的位点数量增加了>10倍。我们的分析提出了一套实用的建议,以最大限度地提高RAD-seq研究中的系统发育信息。

著录项

相似文献

  • 外文文献
  • 中文文献
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号