首页> 外文期刊>Journal of molecular graphics & modelling >Predictive Bayesian neural network models of MHC class II peptide binding
【24h】

Predictive Bayesian neural network models of MHC class II peptide binding

机译:MHC II类肽结合的预测贝叶斯神经网络模型

获取原文
获取原文并翻译 | 示例
           

摘要

We used Bayesian regularized neural networks to model data on the MHC class II-binding affinity of peptides. Training data consisted of sequences and binding data for nonamer (nine amino acid) peptides. Independent test data consisted of sequences and binding data for peptides of length ≤25. We assumed that MHC class II-binding activity of peptides depends only on the highest ranked embedded nonamer and that reverse sequences of active nonamers are inactive. We also internally validated the models by using 30% of the training data in an internal test set. We obtained robust models, with near identical statistics for multiple training runs. We determined how predictive our models were using statistical tests and area under the Receiver Operating Characteristic (ROC) graphs (AROC). Most models gave training AROC values close to 1.0 and test set AROC values >0.8. We also used both amino acid indicator variables (bin20) and property-based descriptors to generate models for MHC class II-binding of peptides. The property-based descriptors were more parsimonious than the indicator variable descriptors, making them applicable to larger peptides, and their design makes them able to generalize to unknown peptides outside of the training space. None of the external test data sets contained any of the nonamer sequences in the training sets. Consequently, the models attempted to predict the activity of truly unknown peptides not encountered in the training sets. Our models were well able to tackle the difficult problem of correctly predicting the MHC class II-binding activities of a majority of the test set peptides. Exceptions to the assumption that nonamer motif activities were invariant to the peptide in which they were embedded, together with the limited coverage of the test data, and the fuzziness of the classification procedure, are likely explanations for some misclassifications.
机译:我们使用贝叶斯正则化神经网络来建模关于MHC II类肽结合亲和力的数据。训练数据包括九聚体(九个氨基酸)肽的序列和结合数据。独立的测试数据由长度≤25的肽的序列和结合数据组成。我们假设肽的MHC II类结合活性仅取决于排名最高的嵌入式九聚体,而活性九聚体的反向序列是无活性的。我们还通过在内部测试集中使用30%的训练数据在内部验证了模型。我们获得了健壮的模型,并且多次训练的统计数据几乎相同。我们使用接收者工作特征(ROC)图(AROC)下的统计测试和面积确定了模型的预测能力。大多数模型提供的训练AROC值接近1.0,测试集AROC值> 0.8。我们还使用了氨基酸指示剂变量(bin20)和基于属性的描述符来生成肽的MHC II类结合模型。基于属性的描述符比指示符变量描述符更简单,使其适用于较大的肽段,其设计使其能够推广到训练空间之外的未知肽段。外部测试数据集均未包含训练集中的任何九聚体序列。因此,模型试图预测训练集中未遇到的真正未知肽的活性。我们的模型很好地解决了正确预测大多数测试集肽的MHC II类结合活性的难题。假设非氨基序活性对于嵌入它们的肽是不变的,以及测试数据的覆盖范围有限以及分类程序的模糊性,可能是某些错误分类的解释。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号