Primary orthologs from local sequence context

Kun Gao; Jonathan Miller

首页> 外文期刊>BMC Bioinformatics >Primary orthologs from local sequence context

【24h】

Primary orthologs from local sequence context

机译：来自局部序列背景的主要立即

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The evolutionary history of genes serves as a cornerstone of contemporary biology. Most conserved sequences in mammalian genomes don’t code for proteins, yielding a need to infer evolutionary history of sequences irrespective of what kind of functional element they may encode. Thus, sequence-, as opposed to gene-, centric modes of inferring paths of sequence evolution are increasingly relevant. Customarily, homologous sequences derived from the same direct ancestor, whose ancestral position in two genomes is usually conserved, are termed “primary” (or “positional”) orthologs. Methods based solely on similarity don’t reliably distinguish primary orthologs from other homologs; for this, genomic context is often essential. Context-dependent identification of orthologs traditionally relies on genomic context over length scales characteristic of conserved gene order or whole-genome sequence alignment, and can be computationally intensive. We demonstrate that short-range sequence context—as short as a single “maximal” match— distinguishes primary orthologs from other homologs across whole genomes. On mammalian whole genomes not preprocessed by repeat-masker, potential orthologs are extracted by genome intersection as “non-nested maximal matches:” maximal matches that are not nested into other maximal matches. It emerges that on both nucleotide and gene scales, non-nested maximal matches recapitulate primary or positional orthologs with high precision and high recall, while the corresponding computation consumes less than one thirtieth of the computation time required by commonly applied whole-genome alignment methods. In regions of genomes that would be masked by repeat-masker, non-nested maximal matches recover orthologs that are inaccessible to Lastz net alignment, for which repeat-masking is a prerequisite. mmRBHs, reciprocal best hits of genes containing non-nested maximal matches, yield novel putative orthologs, e.g. around 1000 pairs of genes for human-chimpanzee. We describe an intersection-based method that requires neither repeat-masking nor alignment to infer evolutionary history of sequences based on short-range genomic sequence context. Ortholog identification based on non-nested maximal matches is parameter-free, and less computationally intensive than many alignment-based methods. It is especially suitable for genome-wide identification of orthologs, and may be applicable to unassembled genomes. We are agnostic as to the reasons for its effectiveness, which may reflect local variation of mean mutation rate.

机译：基因的进化史作为当代生物学的基石。哺乳动物基因组中最保守的序列不会为蛋白质编码，不管他们可以编码哪种功能元素，不需要推断出序列的进化史。因此，序列 - 与基因，以序列演进的推断路径相反，越来越相关。通常，源自同一直接祖先的同源序列，其两个基因组中的祖先位置通常是保守的，被称为“初级”（或“位置”）原序。仅基于相似性的方法不可靠地区分原代从其他同源物中的原发性地区;为此，基因组背景往往是必不可少的。上下文依赖性依赖性地识别的传统上依赖于基因组背景，在保守的基因阶或全基因组序列对准的长度明显特征，并且可以计算密集。我们证明了短程序列背景 - 与单一的“最大”一样短，与整个基因组中的其他同源物区分开主角。在不受重复掩蔽器预处理的哺乳动物的整个基因组上，通过基因组交叉口提取潜在的潜在原子术，作为“非嵌套最大匹配：”最大匹配，这些匹配不会嵌套在其他最大匹配中。它出现在核苷酸和基因尺度上，非嵌套的最大匹配具有高精度和高召回的主要或位置立即，而相应的计算消耗常用的全基因组对准方法所需的计算时间小于一二十一秒。在由重复掩蔽器掩蔽的基因组区域中，非嵌套的最大匹配恢复终端可能无法访问的恢复净对齐，其重复掩蔽是先决条件。 MMRBHS，含有非嵌套最大匹配的基因的互核性最佳击球，产生新颖的推定立体表观，例如，为人类黑猩猩约1000对基因。我们描述了一种基于交叉口的方法，其既不需要重复掩蔽也不是基于短程基因组序列上下文推断出序列的进化历史。基于非嵌套的最大匹配的Orthog识别是无参数的，并且较少的计算密集而不是许多基于对齐的方法。它特别适用于外部的基因组鉴定，并且可以适用于未组装的基因组。我们对其有效性的原因无关，这可能反映了平均突变率的局部变化。

著录项

来源
《BMC Bioinformatics》 |2020年第1期|共22页
作者
Kun Gao; Jonathan Miller;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
Primary/positional orthologyGenomic contextK-merReciprocal best hitWhole-genome alignment;

机译：初级/位置外科术中的Contextk-merreciprocal最佳Hithwhole-Genome对齐;
入库时间 2022-08-18 23:39:35

相似文献

外文文献
中文文献
专利

1. A global definition of expression context is conserved between orthologs, but does not correlate with sequence conservation [J] . Bas E Dutilh, Martijn A Huynen, Berend Snel BMC Genomics . 2006,第1期

机译：在直向同源物之间保留表达上下文的全局定义，但与序列保守性不相关
2. The C-Terminal Sequence and PI motif of the Orchid (Oncidium Gower Ramsey) PISTILLATA (PI) Ortholog Determine its Ability to Bind AP3 Orthologs and Enter the Nucleus to Regulate Downstream Genes Controlling Petal and Stamen Formation [J] . Mao Wan-Ting, Hsu Hsing-Fun, Hsu Wei-Han, Plant and cell physiology . 2015,第11期

机译：兰花（Oncidium Gower Ramsey）PISTILLATA（PI）直系同源物的C末端序列和PI基序决定了其结合AP3直系同源物并进入核以调节控制花瓣和雄蕊形成的下游基因的能力。
3. Efficient large-scale protein sequence comparison and gene matching to identify orthologs and co-orthologs [J] . Arun S. Konagurthu, Geoffrey I. Webb, James C. Whisstock, Nucleic acids research . 2012,第6期

机译：高效的大规模蛋白质序列比较和基因匹配，可鉴定直系同源物和同系同源物
4. Classification of Protein Sequences into Paralog and Ortholog Clusters Using Sequence Similarity Profiles of KEGG/SSDB [C] . Yohsuke Minowa, Toshiaki Katayama, Akihiro Nakaya, International Conference on Genome Informatics . 2003

机译：使用KEGG / SSDB的序列相似性分布对帕拉戈曲素和垂直簇的分类
5. Inference of orthologs, while considering gene conversion, to evaluate whole-genome multiple sequence alignments. [D] . Hsu, Chih-Hao. 2009

机译：直系同源物的推断，同时考虑基因转换，以评估全基因组多序列比对。
6. Primary orthologs from local sequence context [O] . Kun Gao, Jonathan Miller 2020

机译：来自局部序列上下文的主要直系同源物
7. A global definition of expression context is conserved between orthologs, but does not correlate with sequence conservation [O] . Snel Berend, Huynen Martijn A, Dutilh Bas E 2006

机译：在直向同源物之间保留表达上下文的全局定义，但与序列保守性不相关

Primary orthologs from local sequence context

摘要

著录项

相似文献

相关主题

期刊订阅