首页> 外文期刊>BioSystems >A multiple information fusion method for predicting subcellular locations of two different types of bacterial protein simultaneously
【24h】

A multiple information fusion method for predicting subcellular locations of two different types of bacterial protein simultaneously

机译:同时预测两种不同类型细菌蛋白亚细胞位置的多信息融合方法

获取原文
获取原文并翻译 | 示例
       

摘要

Subcellular localization prediction of bacterial protein is an important component of bioinformatics, which has great importance for drug design and other applications. For the prediction of protein subcellular localization, as we all know, lots of computational tools have been developed in the recent decades. In this study, we firstly introduce three kinds of protein sequences encoding schemes: physicochemical based, evolutionary-based, and GO-based. The original and consensus sequences were combined with physicochemical properties. And elements information of different rows and columns in position-specific scoring matrix were taken into consideration simultaneously for more core and essence information. Computational methods based on gene ontology (GO) have been demonstrated to be superior to methods based on other features. Then principal component analysis (PCA) is applied for feature selection and reduced vectors are input to a support vector machine (SVM) to predict protein subcellular localization. The proposed method can achieve a prediction accuracy of 98.28% and 97.87% on a stringent Gram-positive (Gpos) and Gram-negative (Gneg) dataset with Jackknife test, respectively. At last, we calculate "absolute true overall accuracy (ATOA)", which is stricter than overall accuracy. The ATOA obtained from the proposed method is also up to 97.32% and 93.06% for Gpos and Gneg. From both the rationality of testing procedure and the success rates of test results, the current method can improve the prediction quality of protein subcellular localization. (C) 2015 Elsevier Ireland Ltd. All rights reserved.
机译:细菌蛋白的亚细胞定位预测是生物信息学的重要组成部分,对于药物设计和其他应用具有重要意义。众所周知,对于蛋白质亚细胞定位的预测,近几十年来已经开发了许多计算工具。在这项研究中,我们首先介绍三种蛋白质序列编码方案:基于物理化学,基于进化和基于GO。原始序列和共有序列结合了理化特性。同时,还考虑了位置特定评分矩阵中不同行和列的元素信息,以获得更多的核心和本质信息。基于基因本体(GO)的计算方法已被证明优于基于其他特征的方法。然后将主成分分析(PCA)应用于特征选择,并将简化的向量输入支持向量机(SVM)以预测蛋白质亚细胞定位。提出的方法通过Jackknife检验在严格的革兰氏阳性(Gpos)和革兰氏阴性(Gneg)数据集上的预测精度分别为98.28%和97.87%。最后,我们计算“绝对真实总精度(ATOA)”,它比总精度更严格。对于Gpos和Gneg,从建议的方法获得的ATOA也分别高达97.32%和93.06%。从测试程序的合理性和测试结果的成功率来看,目前的方法可以提高蛋白质亚细胞定位的预测质量。 (C)2015 Elsevier Ireland Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号