首页> 外文会议>International Conference on Intelligent Systems for Molecular biology >An ENSEMBLE machine learning approach for the prediction of all-alpha membrane proteins
【24h】

An ENSEMBLE machine learning approach for the prediction of all-alpha membrane proteins

机译:全α膜蛋白预测的集合机器学习方法

获取原文

摘要

Motivation: All-alpha membrane proteins constitute a functionally relevant subset of the whole proteome. Their content ranges from about 10 to 30% of the cell proteins, based on sequence comparison and specific predictive methods. Due to the paucity of membrane proteins solved with atomic resolution, the training/testing sets of predictive methods for protein topography and topology routinely include very few well-solved structures mixed with a hundred proteins known with low resolution. Moreover, available predictors fail in predicting recently crystallised membrane proteins (Chen et al., 2002). Presently the number of well-solved membrane proteins comprises some 59 chains of low sequence homology. It is therefore possible to train/test predictorsonly with the set of proteins known with atomic resolution and evaluate more thoroughly the performance of different methods. Results: We implement a cascade-neural network (NN), two different hidden Markov models (HMM), and their ensemble (ENSEMBLE) asa new method. We train and test in cross validation the three methods and ENSEMBLE on the 59 well resolved membrane proteins. ENSEMBLE scores with a per-protein accuracy of 90% for topography and 71% for topology, outperforming the best single method of7 and 5 percentage points, respectively. When tested on a low resolution set of 151 proteins, with no homology with the 59 proteins, the per-protein accuracy of ENSEMBLE is 76% for topography and 68% for topology. Our results also indicate that the performance of ENSEMBLE is higher than that of the best predictors presently available on the Web.
机译:动机:全α膜蛋白质构成了整个蛋白质组的功能相关的子集。基于序列比较和特定的预测方法,它们的内容范围为细胞蛋白的约10%至30%。由于具有原子分辨率解​​决的膜蛋白的缺乏,蛋白质地形和拓扑的预测方法训练/测试组经常包括与用低分辨率所知的一百个蛋白质混合的非常少量溶解的结构。此外,可用的预测因子预测最近结晶的膜蛋白(Chen等,2002)。目前,良好溶解的膜蛋白的数量包含约59个低序列同源性的链。因此,可以用原子分辨率已知的一组蛋白质预测/测试预测,并更彻底地评估不同方法的性能。结果:我们实施了级联神经网络(NN),两种不同的隐马尔可夫模型(HMM),以及它们的合奏(集成)ASA新方法。我们在交叉验证中培训和测试三种方法和合奏在59孔良好的膜蛋白上。整体蛋白质精度为90%的集合分数,拓扑71%,优于7个和5个百分点的最佳方法。当在低分辨率的151个蛋白组上测试时,没有与59个蛋白质的同源性,整体蛋白质的每蛋白质精度为76%,拓扑的68%。我们的结果还表明,Ensemble的性能高于目前网络上可用的最佳预测器的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号