...
首页> 外文期刊>Journal of Molecular Biology >A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome.
【24h】

A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome.

机译:贝叶斯系统整合表达数据和序列模式以定位蛋白质:全面应用于酵母基因组。

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We develop a probabilistic system for predicting the subcellular localization of proteins and estimating the relative population of the various compartments in yeast. Our system employs a Bayesian approach, updating a protein's probability of being in a compartment, based on a diverse range of 30 features. These range from specific motifs (e.g. signal sequences or the HDEL motif) to overall properties of a sequence (e.g. surface composition or isoelectric point) to whole-genome data (e.g. absolute mRNA expression levels or their fluctuations). The strength of our approach is the easy integration of many features, particularly the whole-genome expression data. We construct a training and testing set of approximately 1300 yeast proteins with an experimentally known localization from merging, filtering, and standardizing the annotation in the MIPS, Swiss-Prot and YPD databases, and we achieve 75 % accuracy on individual protein predictions using this dataset. Moreover, we are able to estimate the relative protein population of the various compartments without requiring a definite localization for every protein. This approach, which is based on an analogy to formalism in quantum mechanics, gives better accuracy in determining relative compartment populations than that obtained by simply tallying the localization predictions for individual proteins (on the yeast proteins with known localization, 92% versus 74%). Our training and testing also highlights which of the 30 features are informative and which are redundant (19 being particularly useful). After developing our system, we apply it to the 4700 yeast proteins with currently unknown localization and estimate the relative population of the various compartments in the entire yeast genome. An unbiased prior is essential to this extrapolated estimate; for this, we use the MIPS localization catalogue, and adapt recent results on the localization of yeast proteins obtained by Snyder and colleagues using a minitransposon system. Our final localizations for all approximately 6000 proteins in the yeast genome are available over the web at: http://bioinfo.mbb.yale. edu/genome/localize. Copyright 2000 Academic Press.
机译:我们开发了一个概率系统,用于预测蛋白质的亚细胞定位并估计酵母中各个区室的相对种群。我们的系统采用贝叶斯方法,根据30种特征的不同范围,更新了蛋白质在隔室中的概率。这些范围从特定的基序(例如信号序列或HDEL基序)到序列的整体特性(例如表面组成或等电点)到全基因组数据(例如绝对mRNA表达水平或其波动)。我们方法的优势在于可以轻松集成许多功能,尤其是全基因组表达数据。我们通过合并,过滤和标准化MIPS,Swiss-Prot和YPD数据库中的注释,以实验已知的方式构建了约1300种酵母蛋白的训练和测试集,使用此数据集,我们对单个蛋白质的预测准确性达到了75% 。此外,我们能够估计各个区室的相对蛋白质种群,而无需对每种蛋白质进行明确的定位。这种方法基于类似于量子力学中形式主义的方法,与通过简单地对单个蛋白质的定位预测(在已知定位的酵母蛋白质上,分别为92%和74%)获得的结果相比,在确定相对区室种群中的准确性更高。 。我们的培训和测试还强调了这30个功能中的哪些功能是有用的,哪些是多余的(其中19个特别有用)。开发我们的系统后,我们将其应用于目前位置未知的4700种酵母蛋白,并估算整个酵母基因组中各个区室的相对种群。无偏先验对于这个推断的估计是必不可少的。为此,我们使用MIPS定位目录,并将最近的结果应用于Snyder及其同事使用小型转座子系统获得的酵母蛋白质的定位。我们对酵母基因组中所有大约6000种蛋白质的最终定位可通过以下网站获得:http://bioinfo.mbb.yale。 edu / genome / localize。版权所有2000学术出版社。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号