首页> 外文期刊>Bioinformatics >Operon prediction without a training set
【24h】

Operon prediction without a training set

机译:没有训练集的Operon预测

获取原文
获取原文并翻译 | 示例
           

摘要

Motivation: Annotation of operons in a bacterial genome is an important step in determining an organism's transcriptional regulatory program. While extensive studies of operon structure have been carried out in a few species such as Escherichia coli, fewer resources exist to inform operon prediction in newly sequenced genomes. In particular, many extant operon finders require a large body of training examples to learn the properties of operons in the target organism. For newly sequenced genomes, such examples are generally not available; moreover, a model of operons trained on one species may not reflect the properties of other, distantly related organisms. We encountered these issues in the course of predicting operons in the genome of Bacteroides thetaiotaomicron (B.theta), a common anaerobe that is a prominent component of the normal adult human intestinal microbial community. Results: We describe an operon predictor designed to work without extensive training data. We rely on a small set of a priori assumptions about the properties of the genome being annotated that permit estimation of the probability that two adjacent genes lie in a common operon. Predictions integrate several sources of information, including intergenic distance, common functional annotation and a novel formulation of conserved gene order. We validate our predictor both on the known operons of E.coli and on the genome of B.theta, using expression data to evaluate our predictions in the latter.
机译:动机:细菌基因组中操纵子的注释是确定生物体转录调控程序的重要步骤。尽管已经在少数物种(如大肠杆菌)中进行了操纵子结构的广泛研究,但在新测序的基因组中,操纵子预测的资源却很少。特别是,许多现存的操纵子发现者需要大量的训练实例来学习目标生物中操纵子的特性。对于新测序的基因组,此类示例通常不可用。此外,在一个物种上训练的操纵子模型可能无法反映其他远缘生物的特性。我们在预测拟杆菌(B.theta)的基因组中操纵子时遇到了这些问题,这是一种常见的厌氧菌,是正常成人肠道微生物群落的重要组成部分。结果:我们描述了一种操纵子预测子,该预测子设计为无需大量培训数据即可工作。我们依靠一小部分关于被注释基因组特性的先验假设,该假设可以估计两个相邻基因位于同一操纵子中的概率。预测整合了多种信息来源,包括基因间距离,共同的功能注释和保守基因顺序的新颖表述。我们使用表达数据来评估我们在大肠杆菌中的已知操纵子和B.theta基因组上的预测因子。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号