首页> 美国卫生研究院文献>Briefings in Bioinformatics >Computational methods for Gene Orthology inference
【2h】

Computational methods for Gene Orthology inference

机译:基因正态推断的计算方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Accurate inference of orthologous genes is a pre-requisite for most comparative genomics studies, and is also important for functional annotation of new genomes. Identification of orthologous gene sets typically involves phylogenetic tree analysis, heuristic algorithms based on sequence conservation, synteny analysis, or some combination of these approaches. The most direct tree-based methods typically rely on the comparison of an individual gene tree with a species tree. Once the two trees are accurately constructed, orthologs are straightforwardly identified by the definition of orthology as those homologs that are related by speciation, rather than gene duplication, at their most recent point of origin. Although ideal for the purpose of orthology identification in principle, phylogenetic trees are computationally expensive to construct for large numbers of genes and genomes, and they often contain errors, especially at large evolutionary distances. Moreover, in many organisms, in particular prokaryotes and viruses, evolution does not appear to have followed a simple ‘tree-like’ mode, which makes conventional tree reconciliation inapplicable. Other, heuristic methods identify probable orthologs as the closest homologous pairs or groups of genes in a set of organisms. These approaches are faster and easier to automate than tree-based methods, with efficient implementations provided by graph-theoretical algorithms enabling comparisons of thousands of genomes. Comparisons of these two approaches show that, despite conceptual differences, they produce similar sets of orthologs, especially at short evolutionary distances. Synteny also can aid in identification of orthologs. Often, tree-based, sequence similarity- and synteny-based approaches can be combined into flexible hybrid methods.
机译:直系同源基因的正确推断是大多数比较基因组学研究的前提,对于新基因组的功能注释也很重要。直系同源基因集的鉴定通常包括系统树分析,基于序列保守性的启发式算法,同义分析或这些方法的某种组合。最直接的基于树的方法通常依赖于将单个基因树与物种树进行比较。一旦正确构建了这两棵树,直系同源物的定义就可以直接确定直系同源物,即直系同源物是在最近的起源点通过物种形成而不是基因复制而关联的那些同源物。尽管从原则上来说,这是理想的正畸识别方法,但系统发育树在构建大量基因和基因组时在计算上昂贵,并且通常包含错误,尤其是在较大的进化距离处。此外,在许多生物体中,特别是原核生物和病毒中,进化似乎没有遵循简单的“树状”模式,这使得传统的树木和解不适用。其他启发式方法将可能的直系同源物识别为一组生物体中最接近的同源基因对或基因组。这些方法比基于树的方法更快,更容易实现自动化,其图形理论算法提供了有效的实现方式,可以比较数千个基因组。两种方法的比较表明,尽管在概念上存在差异,但它们产生相似的直系同源物,特别是在进化距离短的情况下。同步还可以帮助识别直系同源物。通常,基于树的,基于序列相似性和基于同义的方法可以组合为灵活的混合方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号