首页> 外文期刊>Algorithms for Molecular Biology >Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework
【24h】

Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework

机译:蛋白质(多)位置预测:在概率框架中使用位置相互依赖性

获取原文
       

摘要

Motivation Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins. Most such methods are based on the over-simplifying assumption that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems attempt to predict multiple locations of proteins, their performance leaves much room for improvement. Moreover, they typically treat locations as independent and do not attempt to utilize possible inter-dependencies among locations. Our hypothesis is that directly incorporating inter-dependencies among locations into both the classifier-learning and the prediction process can improve location prediction performance. Results We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the location-prediction process of multiply-localized proteins. Our method is based on a collection of Bayesian network classifiers, where each classifier is used to predict a single location. Learning the structure of each Bayesian network classifier takes into account inter-dependencies among locations, and the prediction process uses estimates involving multiple locations. We evaluate our system on a dataset of single- and multi-localized proteins (the most comprehensive protein multi-localization dataset currently available, derived from the DBMLoc dataset). Our results, obtained by incorporating inter-dependencies, are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc+), without being restricted only to location-combinations present in the training set.
机译:动机了解蛋白质在细胞内的位置对于理解其功能,在生物过程中的作用以及作为药物靶标的潜在用途非常重要。在开发预测蛋白质单个位置的计算方法方面已经取得了很大进展。大多数此类方法都是基于过度简化的假设,即蛋白质只能定位到一个位置。然而,已经显示蛋白质定位于多个位置。尽管最近的一些系统试图预测蛋白质的多个位置,但它们的性能仍留有很大的改进空间。此外,他们通常将位置视为独立的,并且不尝试利用位置之间可能的相互依赖性。我们的假设是,将位置之间的相互依赖性直接纳入分类器学习和预测过程中可以提高位置预测性能。结果我们提出了一种新方法和一个初步系统,该系统已将直接依赖位置之间的相互依赖性纳入多定位蛋白的位置预测过程中。我们的方法基于贝叶斯网络分类器的集合,其中每个分类器用于预测单个位置。学习每个贝叶斯网络分类器的结构时要考虑到位置之间的相互依赖性,并且预测过程会使用涉及多个位置的估计。我们在单定位和多定位蛋白质的数据集(当前可用的最全面的蛋白质多定位数据集,源自DBMLoc数据集)上评估我们的系统。通过合并相互依存关系获得的结果明显高于不使用相互依存关系的分类器获得的结果。我们的系统在多定位蛋白质上的性能可与性能最佳的系统(YLoc + )相媲美,而不仅限于训练集中存在的位置组合。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号