首页> 外文期刊>Protein Engineering Design and Selection >Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins
【24h】

Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins

机译:Gpos-PLoc:整体分类器,用于预测革兰氏阳性细菌蛋白的亚细胞定位

获取原文
获取原文并翻译 | 示例
           

摘要

A statistical analysis indicated that, of the 35 016 Gram-positive bacterial proteins from the recent Swiss-Prot database, ~57% of these entries are without subcellular location annotations. In the gene ontology database, the corresponding percentage is ~67%, meaning the percentage of proteins without subcellular component annotations is even higher. With the avalanche of gene products generated in the post-genomic era, the number of such location-unknown entries will continuously increase. It is highly desired to develop an automated method for timely and accurately identifying their subcellular localization because the information thus obtained is very useful for both basic research and drug discovery practice. In view of this, an ensemble classifier called ‘Gpos-PLoc’ was developed for predicting Gram-positive protein subcellular localization. The new predictor is featured by fusing many basic classifiers, each of which was engineered according to the optimized evidence-theoretic K-nearest neighbors rule. As a demonstration, tests were performed on Gram-positive proteins among the following five subcellular location sites: (1) cell wall, (2) cytoplasm, (3) extracell, (4) periplasm and (5) plasma membrane. To eliminate redundancy and homology bias, only those proteins which have < 25% sequence identity to any other in a same subcellular location were allowed to be included in the benchmark datasets. The overall success rates thus achieved by Gpos-PLoc were > 80% for both jackknife cross-validation test and independent dataset test, implying that Gpos-PLoc might become a very useful vehicle for expediting the analysis of Gram-positive bacterial proteins. Gpos-PLoc is freely accessible to public as a web-server at http://202.120.37.186/bioinf/Gpos/. To support the need of many investigators in the relevant areas, a downloadable file is provided at the same website to list the results identified by Gpos-PLoc for 31 898 Gram-positive bacterial protein entries in Swiss-Prot database that either have no subcellular location annotation or are annotated with uncertain terms such as ‘probable’, ‘potential’, ‘perhaps’ and ‘by similarity’. Such large-scale results will be updated once a year to include the new entries of Gram-positive bacterial proteins and reflect the continuous development of Gpos-PLoc.
机译:统计分析表明,在最近的Swiss-Prot数据库中的35 016种革兰氏阳性细菌蛋白中,约有57%的这些蛋白没有亚细胞定位注释。在基因本体数据库中,相应的百分比为〜67%,这意味着没有亚细胞成分注释的蛋白质的百分比甚至更高。随着后基因组时代产生的大量基因产物,此类未知位置的条目的数量将不断增加。迫切需要开发一种用于及时准确地识别其亚细胞定位的自动化方法,因为由此获得的信息对于基础研究和药物发现实践都是非常有用的。有鉴于此,开发了一种称为“ Gpos-PLoc”的整体分类器,用于预测革兰氏阳性蛋白的亚细胞定位。新的预测变量的特征是融合了许多基本分类器,每个分类器都是根据优化的证据理论K最近邻居规则进行设计的。作为演示,对以下五个亚细胞定位部位中的革兰氏阳性蛋白进行了测试:(1)细胞壁,(2)细胞质,(3)细胞外,(4)周质和(5)质膜。为了消除冗余和同源性偏倚,只允许那些在同一亚细胞位置与其他任何蛋白质具有小于25%序列同一性的蛋白质被包括在基准数据集中。因此,对于折刀交叉验证测试和独立数据集测试,通过Gpos-PLoc获得的总体成功率均> 80%,这意味着Gpos-PLoc可能成为加速革兰氏阳性细菌蛋白分析的非常有用的工具。 Gpos-PLoc可作为Web服务器在http://202.120.37.186/bioinf/Gpos/上免费提供给公众。为了支持相关领域的许多研究人员的需求,在同一网站上提供了可下载的文件,以列出Gpos-PLoc为Swiss-Prot数据库中31 898个革兰氏阳性细菌蛋白条目确定的结果,这些条目都不存在亚细胞位置注释或使用不确定的术语(例如“可能”,“潜在”,“可能”和“通过相似性”)进行注释。这样的大规模结果将每年更新一次,以包括革兰氏阳性细菌蛋白的新条目,并反映Gpos-PLoc的持续发展。

著录项

  • 来源
    《Protein Engineering Design and Selection》 |2007年第1期|39-46|共8页
  • 作者单位

    Institute of Image Processing Pattern Recognition Shanghai Jiaotong University 1954 Hua-Shan Road Shanghai 200030 China;

    Gordon Life Science Institute 13784 Torrey Del Mar Drive San Diego CA 92130 USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号