首页> 外文期刊>BMC Evolutionary Biology >Analysis of phylogenomic datasets reveals conflict, concordance, and gene duplications with examples from animals and plants
【24h】

Analysis of phylogenomic datasets reveals conflict, concordance, and gene duplications with examples from animals and plants

机译:对植物遗传学数据集的分析揭示了冲突,一致性和基因重复,其中包括动植物的实例

获取原文
           

摘要

Background The use of transcriptomic and genomic datasets for phylogenetic reconstruction has become increasingly common as researchers attempt to resolve recalcitrant nodes with increasing amounts of data. The large size and complexity of these datasets introduce significant phylogenetic noise and conflict into subsequent analyses. The sources of conflict may include hybridization, incomplete lineage sorting, or horizontal gene transfer, and may vary across the phylogeny. For phylogenetic analysis, this noise and conflict has been accommodated in one of several ways: by binning gene regions into subsets to isolate consistent phylogenetic signal; by using gene-tree methods for reconstruction, where conflict is presumed to be explained by incomplete lineage sorting (ILS); or through concatenation, where noise is presumed to be the dominant source of conflict. The results provided herein emphasize that analysis of individual homologous gene regions can greatly improve our understanding of the underlying conflict within these datasets. Results Here we examined two published transcriptomic datasets, the angiosperm group Caryophyllales and the aculeate Hymenoptera, for the presence of conflict, concordance, and gene duplications in individual homologs across the phylogeny. We found significant conflict throughout the phylogeny in both datasets and in particular along the backbone. While some nodes in each phylogeny showed patterns of conflict similar to what might be expected with ILS alone, the backbone nodes also exhibited low levels of phylogenetic signal. In addition, certain nodes, especially in the Caryophyllales, had highly elevated levels of strongly supported conflict that cannot be explained by ILS alone. Conclusion This study demonstrates that phylogenetic signal is highly variable in phylogenomic data sampled across related species and poses challenges when conducting species tree analyses on large genomic and transcriptomic datasets. Further insight into the conflict and processes underlying these complex datasets is necessary to improve and develop adequate models for sequence analysis and downstream applications. To aid this effort, we developed the open source software phyparts ( https://bitbucket.org/blackrim/phyparts ), which calculates unique, conflicting, and concordant bipartitions, maps gene duplications, and outputs summary statistics such as internode certainy (ICA) scores and node-specific counts of gene duplications.
机译:背景技术随着研究人员尝试使用越来越多的数据来解析顽固性结节,使用转录组学和基因组数据集进行系统发育重建已变得越来越普遍。这些数据集的庞大和复杂性在随后的分析中引入了重大的系统发育噪声和冲突。冲突的来源可能包括杂交,不完整的谱系分类或水平的基因转移,并且可能在整个系统发育上有所不同。对于系统发育分析,已通过以下几种方式之一解决了这种噪音和冲突:将基因区域划分为子集以分离一致的系统发生信号;通过使用基因树方法进行重建,其中冲突可能由不完整的谱系排序(ILS)解释;或通过级联(假定噪音是冲突的主要来源)。本文提供的结果强调,对单个同源基因区域的分析可以大大提高我们对这些数据集内潜在冲突的理解。结果在这里,我们检查了两个已发表的转录组数据集,即被子植物组Caryophyllales和不育的膜翅目昆虫,在整个系统发育过程中的单个同源物中是否存在冲突,一致性和基因重复。我们在两个数据集中的整个系统发育中都发现了重大冲突,尤其是沿主干。虽然每个系统发育中的某些节点显示出与仅使用ILS所预期的相似的冲突模式,但主干节点也显示出低水平的系统发育信号。此外,某些节点,特别是在石竹叶中,具有高度升高的强烈支持的冲突,这不能仅由ILS来解释。结论这项研究表明,系统发育信号在相关物种的系统基因组数据中变化很大,并且在大型基因组和转录组数据集上进行物种树分析时提出了挑战。为了改进和开发用于序列分析和下游应用的适当模型,有必要进一步洞察这些复杂数据集的冲突和过程。为了帮助这项工作,我们开发了开源软件phyparts(https://bitbucket.org/blackrim/phyparts),该软件可计算唯一,冲突和一致的分割,绘制基因重复,并输出汇总统计数据,例如节点间确定性(ICA)。 )得分和基因重复的节点特定计数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号