首页> 外文期刊>Biochemical and Biophysical Research Communications >SPRED: A machine learning approach for the identification of classical and non-classical secretory proteins in mammalian genomes.
【24h】

SPRED: A machine learning approach for the identification of classical and non-classical secretory proteins in mammalian genomes.

机译:SPRED:一种用于识别哺乳动物基因组中经典和非经典分泌蛋白的机器学习方法。

获取原文
获取原文并翻译 | 示例
           

摘要

Eukaryotic protein secretion generally occurs via the classical secretory pathway that traverses the ER and Golgi apparatus. Secreted proteins usually contain a signal sequence with all the essential information required to target them for secretion. However, some proteins like fibroblast growth factors (FGF-1, FGF-2), interleukins (IL-1 alpha, IL-1 beta), galectins and thioredoxin are exported by an alternative pathway. This is known as leaderless or non-classical secretion and works without a signal sequence. Most computational methods for the identification of secretory proteins use the signal peptide as indicator and are therefore not able to identify substrates of non-classical secretion. In this work, we report a random forest method, SPRED, to identify secretory proteins from protein sequences irrespective of N-terminal signal peptides, thus allowing also correct classification of non-classical secretory proteins. Training was performed on a dataset containing 600 extracellular proteins and 600 cytoplasmic and/or nuclear proteins. The algorithm was tested on 180 extracellular proteins and 1380 cytoplasmic and/or nuclear proteins. We obtained 85.92% accuracy from training and 82.18% accuracy from testing. Since SPRED does not use N-terminal signals, it can detect non-classical secreted proteins by filtering those secreted proteins with an N-terminal signal by using SignalP. SPRED predicted 15 out of 19 experimentally verified non-classical secretory proteins. By scanning the entire human proteome we identified 566 protein sequences potentially undergoing non-classical secretion. The dataset and standalone version of the SPRED software is available at http://www.inb.uni-luebeck.de/tools-demos/spred/spred.
机译:真核蛋白的分泌通常通过遍及ER和高尔基体的经典分泌途径发生。分泌的蛋白质通常包含信号序列,并具有将其靶向分泌所需的所有基本信息。但是,某些蛋白质(例如成纤维细胞生长因子(FGF-1,FGF-2),白介素(IL-1α,IL-1β),半乳凝素和硫氧还蛋白)则通过替代途径输出。这称为无领导者分泌或非经典分泌,无需信号序列即可工作。用于鉴定分泌蛋白的大多数计算方法都使用信号肽作为指示剂,因此无法鉴定非经典分泌的底物。在这项工作中,我们报告了一种随机森林方法SPRED,可从蛋白序列中识别分泌蛋白,而与N端信号肽无关,从而也可以对非经典分泌蛋白进行正确分类。对包含600个细胞外蛋白和600个胞质和/或核蛋白的数据集进行了训练。该算法在180种细胞外蛋白和1380种胞质和/或核蛋白上进行了测试。我们从培训中获得85.92%的准确性,从测试中获得82.18%的准确性。由于SPRED不使用N端信号,因此可以通过使用SignalP通过N端信号过滤那些分泌的蛋白质来检测非经典分泌的蛋白质。 SPRED预测了19种经过实验验证的非经典分泌蛋白中的15种。通过扫描整个人类蛋白质组,我们确定了566个可能经历非经典分泌的蛋白质序列。 SPRED软件的数据集和独立版本可从http://www.inb.uni-luebeck.de/tools-demos/spred/spred获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号