首页> 外文会议>2011 IEEE International Conference on Computer Science and Automation Engineering >Predicting non-classical secretory proteins by using Gene Ontology terms and physicochemical properties
【24h】

Predicting non-classical secretory proteins by using Gene Ontology terms and physicochemical properties

机译:通过使用基因本体论术语和理化特性预测非经典分泌蛋白

获取原文

摘要

Eukaryotic secretory proteins that traverse classical ER-Golgi pathway are usually characterized by short N-terminal signal peptides. However, several secretory proteins lacking the signal peptides are found to be exported by a non-classical secretion pathway. Therefore, predicting non-classical secretory proteins regardless of the N-terminal signal peptides is necessary for developing a critical computational approach. Several prediction methods have been proposed by using various types of features to predict secretory proteins. However, prediction performance seems not acceptable. This study proposes an SVM-based prediction method, namely ProSec-iGOX, which uses a major set of informative Gene Ontology (GO) terms and a minor set of assistance features. Physicochemical properties as the assistance features are useful when a query protein sequence without homologous protein with annotated GO terms. Two data sets, S25 and S40, having the identity 25% and 40%, respectively, are adopted for performance comparisons. The ProSec-iGOX yields test accuracies of 95.1% and 96.8% when adopting on the data sets S25 and S40 respectively. The latter accuracy (96.8%) is significantly higher than that of SPRED (82.2%), which uses frequency of tri-peptides and short peptides, secondary structure, physicochemical properties as input features to a random forest classifier. The experimental results show that GO terms are effective features for predicting non-classical secretory proteins.
机译:穿越经典ER-高尔基体通路的真核分泌蛋白通常以短N端信号肽为特征。然而,发现几种缺乏信号肽的分泌蛋白通过非经典的分泌途径输出。因此,开发关键的计算方法,预测与N端信号肽无关的非经典分泌蛋白是必要的。通过使用各种类型的特征来预测分泌蛋白,已经提出了几种预测方法。但是,预测性能似乎无法接受。这项研究提出了一种基于SVM的预测方法,即ProSec-iGOX,该方法使用了主要的一组信息基因本体(GO)术语和次要的辅助功​​能。当查询蛋白序列不含带有注释的GO术语的同源蛋白时,作为辅助特征的理化特性将很有用。性能比较采用两个数据集S25和S40,分别具有25%和40%的标识。当分别在数据集S25和S40上采用ProSec-iGOX时,其测试精度为95.1%和96.8%。后者的准确性(96.8%)显着高于SPRED的准确性(82.2%),后者使用三肽和短肽的频率,二级结构,理化特性作为随机森林分类器的输入特征。实验结果表明,GO术语是预测非经典分泌蛋白的有效特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号