首页> 外文期刊>Artificial intelligence in medicine >Prediction of human major histocompatibility complex class II binding peptides by continuous kernel discrimination method
【24h】

Prediction of human major histocompatibility complex class II binding peptides by continuous kernel discrimination method

机译:连续核判别法预测人类主要组织相容性复合物II类结合肽

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Objective: Accurate prediction of major histocompatibility complex (MHC) class II binding peptides helps reducing the experimental cost for identifying helper T cell epitopes, which has been a challenging problem partly because of the variable length of the binding peptides. This work is to develop an accurate model for predicting MHC-binding peptides using machine learning methods. Methods: In this work, a machine learning method, continuous kernel discrimination (CKD), was used for predicting MHC class II binders of variable lengths. The composition transition and distribution features were used for encoding peptide sequence and the Metropolis Monte Carlo simulated annealing approach was used for feature selection. Results: Feature selection was found to significantly improve the performance of the model. For benchmark dataset Dataset-1, the number of features is reduced from 147 to 24 and the area under the receiver operating characteristic curve (AUC) is improved from 0.8088 to 0.9034, while for benchmark dataset Dataset-2, the number of features is reduced from 147 to 44 and the AUC is improved from 0.7349 to 0.8499. An optimal CKD model was derived from the feature selection and bandwidth optimization using 10-fold cross-validation. Its AUC values are between 0.831 and 0.980 evaluated on benchmark datasets BM-Setl and are between 0.806 and 0.949 on benchmark datasets BM-Set2 for MHC class II alleles. These results indicate a significantly better performance for our CKD model over other earlier models based on the training and testing of the same datasets. Conclusions: Our study suggested that the CKD method outperforms other machine learning methods proposed earlier in the prediction of MHC class II biding peptides. Moreover, the choice of the cut-off for CKD classifier is crucial for its performance.
机译:目的:准确预测主要的组织相容性复合体(MHC)II类结合肽有助于降低鉴定辅助T细胞表位的实验成本,这一直是一个具有挑战性的问题,部分原因是结合肽的长度可变。这项工作是要开发一种使用机器学习方法预测MHC结合肽的准确模型。方法:在这项工作中,使用机器学习方法,连续核识别(CKD)来预测可变长度的MHC II类结合物。组成过渡和分布特征用于编码肽序列,Metropolis Monte Carlo模拟退火方法用于特征选择。结果:发现特征选择可以显着改善模型的性能。对于基准数据集Dataset-1,特征数量从147减少到24,并且接收器工作特性曲线(AUC)下的面积从0.8088改善到0.9034,而对于基准数据集Dataset-2,特征的数量减少从147增加到44,AUC从0.7349提高到0.8499。从特征选择和使用10倍交叉验证的带宽优化中得出了最佳CKD模型。在MHC II类等位基因的基准数据集BM-Set1上评估的AUC值在0.831至0.980之间,在基准数据集BM-Set2上评估其AUC值在0.806至0.949之间。这些结果表明,基于相同数据集的训练和测试,我们的CKD模型的性能明显优于其他早期模型。结论:我们的研究表明CKD方法优于在MHC II类招标肽的预测中较早提出的其他机器学习方法。此外,选择CKD分类器的截止点对其性能至关重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号