首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine Workshop >Domain Content Based Protein Function Prediction Using Incomplete GO Annotation Information
【24h】

Domain Content Based Protein Function Prediction Using Incomplete GO Annotation Information

机译:基于域内容的蛋白质函数预测使用不完整的GO注释信息

获取原文

摘要

Given the essential role of protein in life processes, computational assignment of protein functions has become one of the most important tasks in the area of bioinformatics. While Gene Ontology (GO) has been widely used in functional annotation, new approaches to address the problem of annotation incompleteness, which can leverage the support of the GO framework, are imminently required. In this paper, two new models are proposed to predict GO terms from domain content: a Correlation Coefficient based model (CC-M) and a Support Vector Machine (SVM) based model (SVM-M). We have developed our models in the form of predictors for all GO terms with manually curated annotations. In comparison with the Bayesian probabilistic approach published previously [Forslund et al., 2008], our methods are demonstrated to have better capability in dealing with incomplete training data. In particular, the CC-M method is suitable for GO terms with extremely low occurrence frequency, and the SVM-M method for the remaining GO terms. Therefore, CC-M and SVM-M are subsequently integrated into a single model (CC-SVM), with their respective advantages combined.
机译:鉴于蛋白质在生活过程中的基本作用,蛋白质功能的计算分配已成为生物信息学领域最重要的任务之一。虽然基因本体(GO)已被广泛应用于功能注释,但是新的方法可以采用能够利用GO框架的支持,以解决不完整性的问题。在本文中,提出了两个新模型来预测域内容的GO条款:基于相关系数基于系数的模型(CC-M)和基于支持向量机(SVM)的模型(SVM-M)。我们通过手动策划注释,以预测因子的形式开发了我们的模型。与之前发布的贝叶斯概率方法相比[Forslund等,2008],我们的方法被证明是在处理不完整的培训数据方面具有更好的能力。特别地,CC-M方法适用于具有极低出现频率的GO术语,以及用于剩下的GO条款的SVM-M方法。因此,CC-M和SVM-M随后集成到单个型号(CC-SVM)中,其各自的优点组合。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号