首页> 外文期刊>BMC Bioinformatics >Primary orthologs from local sequence context
【24h】

Primary orthologs from local sequence context

机译:来自局部序列背景的主要立即

获取原文
       

摘要

The evolutionary history of genes serves as a cornerstone of contemporary biology. Most conserved sequences in mammalian genomes don’t code for proteins, yielding a need to infer evolutionary history of sequences irrespective of what kind of functional element they may encode. Thus, sequence-, as opposed to gene-, centric modes of inferring paths of sequence evolution are increasingly relevant. Customarily, homologous sequences derived from the same direct ancestor, whose ancestral position in two genomes is usually conserved, are termed “primary” (or “positional”) orthologs. Methods based solely on similarity don’t reliably distinguish primary orthologs from other homologs; for this, genomic context is often essential. Context-dependent identification of orthologs traditionally relies on genomic context over length scales characteristic of conserved gene order or whole-genome sequence alignment, and can be computationally intensive. We demonstrate that short-range sequence context—as short as a single “maximal” match— distinguishes primary orthologs from other homologs across whole genomes. On mammalian whole genomes not preprocessed by repeat-masker, potential orthologs are extracted by genome intersection as “non-nested maximal matches:” maximal matches that are not nested into other maximal matches. It emerges that on both nucleotide and gene scales, non-nested maximal matches recapitulate primary or positional orthologs with high precision and high recall, while the corresponding computation consumes less than one thirtieth of the computation time required by commonly applied whole-genome alignment methods. In regions of genomes that would be masked by repeat-masker, non-nested maximal matches recover orthologs that are inaccessible to Lastz net alignment, for which repeat-masking is a prerequisite. mmRBHs, reciprocal best hits of genes containing non-nested maximal matches, yield novel putative orthologs, e.g. around 1000 pairs of genes for human-chimpanzee. We describe an intersection-based method that requires neither repeat-masking nor alignment to infer evolutionary history of sequences based on short-range genomic sequence context. Ortholog identification based on non-nested maximal matches is parameter-free, and less computationally intensive than many alignment-based methods. It is especially suitable for genome-wide identification of orthologs, and may be applicable to unassembled genomes. We are agnostic as to the reasons for its effectiveness, which may reflect local variation of mean mutation rate.
机译:基因的进化史作为当代生物学的基石。哺乳动物基因组中最保守的序列不会为蛋白质编码,不管他们可以编码哪种功能元素,不需要推断出序列的进化史。因此,序列 - 与基因,以序列演进的推断路径相反,越来越相关。通常,源自同一直接祖先的同源序列,其两个基因组中的祖先位置通常是保守的,被称为“初级”(或“位置”)原序。仅基于相似性的方法不可靠地区分原代从其他同源物中的原发性地区;为此,基因组背景往往是必不可少的。上下文依赖性依赖性地识别的传统上依赖于基因组背景,在保守的基因阶或全基因组序列对准的长度明显特征,并且可以计算密集。我们证明了短程序列背景 - 与单一的“最大”一样短,与整个基因组中的其他同源物区分开主角。在不受重复掩蔽器预处理的哺乳动物的整个基因组上,通过基因组交叉口提取潜在的潜在原子术,作为“非嵌套最大匹配:”最大匹配,这些匹配不会嵌套在其他最大匹配中。它出现在核苷酸和基因尺度上,非嵌套的最大匹配具有高精度和高召回的主要或位置立即,而相应的计算消耗常用的全基因组对准方法所需的计算时间小于一二十一秒。在由重复掩蔽器掩蔽的基因组区域中,非嵌套的最大匹配恢复终端可能无法访问的恢复净对齐,其重复掩蔽是先决条件。 MMRBHS,含有非嵌套最大匹配的基因的互核性最佳击球,产生新颖的推定立体表观,例如,为人类黑猩猩约1000对基因。我们描述了一种基于交叉口的方法,其既不需要重复掩蔽也不是基于短程基因组序列上下文推断出序列的进化历史。基于非嵌套的最大匹配的Orthog识别是无参数的,并且较少的计算密集而不是许多基于对齐的方法。它特别适用于外部的基因组鉴定,并且可以适用于未组装的基因组。我们对其有效性的原因无关,这可能反映了平均突变率的局部变化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号