首页> 美国卫生研究院文献>Genome Biology and Evolution >A Genome-Scale Investigation of How Sequence Function and Tree-Based Gene Properties Influence Phylogenetic Inference
【2h】

A Genome-Scale Investigation of How Sequence Function and Tree-Based Gene Properties Influence Phylogenetic Inference

机译:基因组规模研究序列功能和基于树的基因特性如何影响系统发育推断。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal and could be useful in guiding the choice of phylogenetic markers.
机译:分子系统发育推断本质上取决于方法和数据的选择。许多有见地的研究表明,方法论上的选择(例如序列进化模型或所用的最佳性标准)如何严重影响推理。相反,人们对数据特性(通常是基因)的选择对系统发生推断的影响知之甚少。我们研究了52个基因属性(基于24个序列,基于19个功能和9个基于树)之间的关系,以及在两个2832个酵母和20002个哺乳动物基因的组装数据集中的三种系统发育信号度量之间的关系。我们发现大多数基因特性,例如进化速度(通过整个分类群中成对同一性的百分比平均值测量)和总树长相互高度相关。同样,一些基因特性,例如基因比对长度,鸟嘌呤-胞嘧啶含量以及内部分支上树距的比例除以相对组成变异性(树度/ RCV),与系统发生信号密切相关。在同时控制基因进化速率和比对长度的基因特性和系统发育信号之间的部分相关性分析显示相似的相关性模式,尽管强度较弱。对每个基因特性在系统发生信号上的相对重要性的检验确定了基因比对长度,以及简约信息位点和可变位点的数量,它们是最重要的预测因子。有趣的是,在我们的三个系统发育指标和两个数据集之间,可以最佳地预测系统发育信号的基因属性子集差异很大。然而,在酵母和哺乳动物中,基因比对长度和RCV始终被作为所有三种系统发育指标的预测因子。这些结果表明,少数基于序列的基因特性是系统发生信号的可靠预测因子,可用于指导系统发生标记的选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号