...
首页> 外文期刊>Amino acids >Euk-PLoc:an ensemble classifier for large-scale eukaryotic protein subcellular location prediction
【24h】

Euk-PLoc:an ensemble classifier for large-scale eukaryotic protein subcellular location prediction

机译:Euk-PLoc:用于大规模真核蛋白亚细胞定位预测的集成分类器

获取原文
获取原文并翻译 | 示例
           

摘要

With the avalanche of newly-found protein sequences emerging in the post genomic era,it is highly desirable to develop an automated method for fast and reliably identifying their subcellular locations because knowledge thus obtained can provide key clues for revealing their functions and understanding how they interact with each other in cellular networking.However,predicting subcellular location of eukaryotic proteins is a challenging problem,particularly when unknown query proteins do not have significant homology to proteins of known subcellular locations and when more locations need to be covered.To cope with the challenge,protein samples are formulated by hybridizing the information derived from the gene ontology database and amphiphilic pseudo amino acid composition.Based on such a representation,a novel ensemble hybridization classifier was developed by fusing many basic individual classifiers through a voting system.Each of these basic classifiers was engineered by the KNN(K-Nearest Neighbor)principle.As a demonstration,a new benchmark dataset was constructed that covers the following 18 localizations:(1)cell wall,(2)centriole,(3)chloroplast,(4)cyanelle,(5)cytoplasm,(6)cytoskeleton,(7)endoplasmic reticulum,(8)extracell,(9)Golgi apparatus,(10)hydrogenosome,(11)lysosome,(12)mitochondria,(13)nucleus,(14)peroxisome,(15)plasma membrane,(16)plastid,(17)spindle pole body,and(18)vacuole.To avoid the homology bias,none of the proteins included has>=25% sequence identity to any other in a same subcellular location.The overall success rates thus obtained via the 5-fold and jackknife cross-validation tests were 81.6 and 80.3%,respectively,which were 40-50% higher than those performed by the other existing methods on the same strict dataset.The powerful predictor,named"Euk-PLoc",is available as a web-server at http://202.120.37.186/bioinf/euk.Furthermore,to support the need of people working in the relevant areas,a downloadable file will be provided at the same website to list the results predicted by Euk-PLoc for all eukaryotic protein entries(excluding fragments)in Swiss-Prot database that do not have subcellular location annotations or are annotated as being uncertain.The large-scale results will be updated twice a year to include the new entries of eukaryotic proteins and reflect the continuous development of Euk-PLoc.
机译:随着后基因组时代出现的大量新发现的蛋白质序列,迫切需要开发一种自动方法来快速可靠地鉴定其亚细胞位置,因为由此获得的知识可以为揭示其功能和理解它们如何相互作用提供关键线索。但是,预测真核蛋白的亚细胞位置是一个挑战性的问题,尤其是当未知查询蛋白与已知亚细胞位置的蛋白没有显着同源性且需要覆盖更多位置时。通过将基因本体数据库中的信息与两亲性假氨基酸组成进行杂交,可以制备蛋白质样品。在这种表示的基础上,通过投票系统将许多基本的个体分类器融合在一起,从而开发出了一种新型的集合杂交分类器。分类器由KNN(K-Ne作为演示,构建了一个新的基准数据集,该数据集涵盖以下18个位置:(1)细胞壁,(2)中心,(3)叶绿体,(4)氰基,(5)细胞质,(6 )细胞骨架,(7)内质网,(8)细胞外膜,(9)高尔基体,(10)氢体,(11)溶酶体,(12)线粒体,(13)核,(14)过氧化物酶体,(15)质膜,(16)质体,(17)主轴极体和(18)真空。为了避免同源性偏见,所包含的蛋白质在同一亚细胞位置中的任何一个都不与其他任何蛋白质具有> = 25%的序列同一性。总体成功率通过5折和折刀交叉验证测试获得的结果分别为81.6%和80.3%,比在相同的严格数据集上通过其他现有方法进行的结果高40-50%。强大的预测因子称为“ Euk- PLoc”可作为Web服务器在http://202.120.37.186/bioinf/euk上获得。此外,为了支持在相关领域工作的人们的需要,将在同一网站上提供可下载的文件。 o列出Euk-PLoc预测的Swiss-Prot数据库中所有没有亚细胞位置注释或不确定的注释的真核蛋白质条目(片段除外)的结果,大规模结果将每年更新两次,包括真核蛋白的新进入,反映了Euk-PLoc的不断发展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号