首页> 美国卫生研究院文献>PLoS Computational Biology >Phylogenetic Reconstruction of Orthology Paralogy and Conserved Synteny for Dog and Human
【2h】

Phylogenetic Reconstruction of Orthology Paralogy and Conserved Synteny for Dog and Human

机译:对人和狗的正统旁系和保守同构进行系统发育重建

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Accurate predictions of orthology and paralogy relationships are necessary to infer human molecular function from experiments in model organisms. Previous genome-scale approaches to predicting these relationships have been limited by their use of protein similarity and their failure to take into account multiple splicing events and gene prediction errors. We have developed PhyOP, a new phylogenetic orthology prediction pipeline based on synonymous rate estimates, which accurately predicts orthology and paralogy relationships for transcripts, genes, exons, or genomic segments between closely related genomes. We were able to identify orthologue relationships to human genes for 93% of all dog genes from Ensembl. Among 1:1 orthologues, the alignments covered a median of 97.4% of protein sequences, and 92% of orthologues shared essentially identical gene structures. PhyOP accurately recapitulated genomic maps of conserved synteny. Benchmarking against predictions from Ensembl and Inparanoid showed that PhyOP is more accurate, especially in its predictions of paralogy. Nearly half (46%) of PhyOP paralogy predictions are unique. Using PhyOP to investigate orthologues and paralogues in the human and dog genomes, we found that the human assembly contains 3-fold more gene duplications than the dog. Species-specific duplicate genes, or “in-paralogues,” are generally shorter and have fewer exons than 1:1 orthologues, which is consistent with selective constraints and mutation biases based on the sizes of duplicated genes. In-paralogues have experienced elevated amino acid and synonymous nucleotide substitution rates. Duplicates possess similar biological functions for either the dog or human lineages. Having accounted for 2,954 likely pseudogenes and gene fragments, and after separating 346 erroneously merged genes, we estimated that the human genome encodes a minimum of 19,700 protein-coding genes, similar to the gene count of nematode worms. PhyOP is a fast and robust approach to orthology prediction that will be applicable to whole genomes from multiple closely related species. PhyOP will be particularly useful in predicting orthology for mammalian genomes that have been incompletely sequenced, and for large families of rapidly duplicating genes.
机译:要从模型生物的实验中推断出人类的分子功能,就必须正确预测正字学和副学关系。以前的预测这些关系的基因组规模方法受到蛋白质相似性的使用以及无法考虑多个剪接事件和基因预测错误的限制。我们已经开发了PhyOP,这是一种基于同义速率估计的新的系统发育正交预测管道,可准确预测紧密相关的基因组之间的转录本,基因,外显子或基因组片段的正交和旁系关系。对于Ensembl的所有狗基因中的93%,我们能够鉴定与人类基因的直向同源性。在1:1直向同源物中,比对覆盖了97.4%的蛋白质序列中值,而92%的直向同源物共有基本相同的基因结构。 PhyOP准确概括了保守同义的基因组图。根据Ensembl和Inparanoid的预测进行的基准测试表明,PhyOP更准确,尤其是在对寄生学的预测中。 PhyOP参数预测的近一半(46%)是唯一的。使用PhyOP来研究人和狗基因组中的直向同源物和旁系同源物,我们发现人的装配体中的基因重复数比狗多3倍。与1:1直向同源物相比,特定于物种的重复基因或“旁系同源物”通常更短,外显子更少,这与基于重复基因大小的选择性限制和突变偏倚相一致。旁观者经历了提高的氨基酸和同义核苷酸取代率。对于狗或人类谱系,复制品具有相似的生物学功能。考虑了2,954个可能的假基因和基因片段,并分离了346个错误合并的基因后,我们估计人类基因组至少编码19,700个蛋白质编码基因,类似于线虫蠕虫的基因数量。 PhyOP是一种快速而稳健的拼字法预测方法,将适用于来自多个密切相关物种的整个基因组。 PhyOP对于预测序列不完整的哺乳动物基因组以及快速复制基因的大家族而言,将特别有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号