首页> 美国卫生研究院文献>Genome Research >Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes
【2h】

Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes

机译:对无领导者转录和非典型基因建模可以使原核生物中的基因预测更加准确

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In a conventional view of the prokaryotic genome organization, promoters precede operons and ribosome binding sites (RBSs) with Shine-Dalgarno consensus precede genes. However, recent experimental research suggesting a more diverse view motivated us to develop an algorithm with improved gene-finding accuracy. We describe GeneMarkS-2, an ab initio algorithm that uses a model derived by self-training for finding species-specific (native) genes, along with an array of precomputed “heuristic” models designed to identify harder-to-detect genes (likely horizontally transferred). Importantly, we designed GeneMarkS-2 to identify several types of distinct sequence patterns (signals) involved in gene expression control, among them the patterns characteristic for leaderless transcription as well as noncanonical RBS patterns. To assess the accuracy of GeneMarkS-2, we used genes validated by COG (Clusters of Orthologous Groups) annotation, proteomics experiments, and N-terminal protein sequencing. We observed that GeneMarkS-2 performed better on average in all accuracy measures when compared with the current state-of-the-art gene prediction tools. Furthermore, the screening of ∼5000 representative prokaryotic genomes made by GeneMarkS-2 predicted frequent leaderless transcription in both archaea and bacteria. We also observed that the RBS sites in some species with leadered transcription did not necessarily exhibit the Shine-Dalgarno consensus. The modeling of different types of sequence motifs regulating gene expression prompted a division of prokaryotic genomes into five categories with distinct sequence patterns around the gene starts.
机译:在原核生物基因组组织的常规观点中,启动子在操纵子之前,核糖体结合位点(RBS)在Shine-Dalgarno共有基因之前。但是,最近的实验研究表明,更多样化的观点促使我们开发出一种具有更高基因发现精度的算法。我们描述了GeneMarkS-2,这是一种从头算算法,该算法使用通过自我训练推导的模型来查找物种特异性(本机)基因,以及一系列旨在识别较难检测到的基因的预先计算的“启发式”模型(可能横向转移)。重要的是,我们设计了GeneMarkS-2来识别基因表达控制中涉及的几种类型的不同序列模式(信号),其中包括无前导转录特征模式和非规范RBS模式。为了评估GeneMarkS-2的准确性,我们使用了通过COG(直系同源群)注解,蛋白质组学实验和N端蛋白质测序验证的基因。我们观察到,与当前最新的基因预测工具相比,GeneMarkS-2在所有准确性指标上的平均表现都更好。此外,对由GeneMarkS-2进行的约5000个代表性原核基因组的筛选预测了古细菌和细菌中的频繁无前导转录。我们还观察到,在某些具有领导转录的物种中,RBS位点不一定显示出Shine-Dalgarno共识。调节基因表达的不同类型序列基序的建模促使原核基因组分为五类,在基因起点周围具有不同的序列模式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号