首页> 外文OA文献 >Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences
【2h】

Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences

机译:利用蛋白质序列的理化特性预测泛素化位点的计算方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

© 2016 Cai and Jiang. Background: Ubiquitination is a very important process in protein post-translational modification, which has been widely investigated by biology scientists and researchers. Different experimental and computational methods have been developed to identify the ubiquitination sites in protein sequences. This paper aims at exploring computational machine learning methods for the prediction of ubiquitination sites using the physicochemical properties (PCPs) of amino acids in the protein sequences. Results: We first establish six different ubiquitination data sets, whose records contain both ubiquitination sites and non-ubiquitination sites in variant numbers of protein sequence segments. In particular, to establish such data sets, protein sequence segments are extracted from the original protein sequences used in four published papers on ubiquitination, while 531 PCP features of each extracted protein sequence segment are calculated based on PCP values from AAindex (Amino Acid index database) by averaging PCP values of all amino acids on each segment. Various computational machine-learning methods, including four Bayesian network methods (i.e., Naïve Bayes (NB), Feature Selection NB (FSNB), Model Averaged NB (MANB), and Efficient Bayesian Multivariate Classifier (EBMC)) and three regression methods (i.e., Support Vector Machine (SVM), Logistic Regression (LR), and Least Absolute Shrinkage and Selection Operator (LASSO)), are then applied to the six established segment-PCP data sets. Five-fold cross-validation and the Area Under Receiver Operating Characteristic Curve (AUROC) are employed to evaluate the ubiquitination prediction performance of each method. Results demonstrate that the PCP data of protein sequences contain information that could be mined by machine learning methods for ubiquitination site prediction. The comparative results show that EBMC, SVM and LR perform better than other methods, and EBMC is the only method that can get AUCs greater than or equal to 0.6 for the six established data sets. Results also show EBMC tends to perform better for larger data. Conclusions: Machine learning methods have been employed for the ubiquitination site prediction based on physicochemical properties of amino acids on protein sequences. Results demonstrate the effectiveness of using machine learning methodology to mine information from PCP data concerning protein sequences, as well as the superiority of EBMC, SVM and LR (especially EBMC) for the ubiquitination prediction compared to other methods.
机译:©2016蔡和江。背景:泛素化是蛋白质翻译后修饰中非常重要的过程,生物学家和研究人员对此进行了广泛研究。已经开发了不同的实验和计算方法来鉴定蛋白质序列中的泛素化位点。本文旨在探索利用机器学习蛋白质序列中氨基酸的物理化学性质(PCPs)来预测泛素化位点的方法。结果:我们首先建立了六个不同的泛素化数据集,它们的记录同时包含泛素化位点和非泛素化位点,其蛋白质序列段的数量不同。特别是,要建立此类数据集,请从四篇有关泛素化的论文中使用的原始蛋白质序列中提取蛋白质序列段,同时根据Aai​​ndex(氨基酸索引数据库)中的PCP值计算每个提取的蛋白质序列段的531个PCP特征)来平均每个片段上所有氨基酸的PCP值。各种计算机器学习方法,包括四种贝叶斯网络方法(即朴素贝叶斯(NB),特征选择NB(FSNB),模型平均NB(MANB)和有效贝叶斯多元分类器(EBMC))和三种回归方法(即然后,将支持向量机(SVM),逻辑回归(LR)和最小绝对收缩与选择算子(LASSO)应用于六个已建立的segment-PCP数据集。采用五重交叉验证和接收者操作特征曲线下面积(AUROC)来评估每种方法的泛素化预测性能。结果表明,蛋白质序列的PCP数据包含可以通过机器学习方法进行泛素化位点预测的信息。比较结果表明,EBMC,SVM和LR的性能优于其他方法,并且EBMC是唯一可以使六个已建立数据集的AUC大于或等于0.6的方法。结果还显示,EBMC倾向于在较大数据上表现更好。结论:机器学习方法已被用于基于蛋白质序列上氨基酸的物理化学性质的泛素化位点预测。结果证明了使用机器学习方法从PCP数据中挖掘有关蛋白质序列的信息的有效性,以及与其他方法相比,EBMC,SVM和LR(尤其是EBMC)在泛素化预测方面的优越性。

著录项

  • 作者

    Cai B; Jiang X;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号