首页> 美国卫生研究院文献>BMC Bioinformatics >A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs
【2h】

A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs

机译:一种两层机器学习方法用于识别具有O-GlcNAc转移酶底物基序的蛋白O-GlcNAcylation位点

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Protein O-GlcNAcylation, involving the β-attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues, is an O-linked glycosylation catalyzed by O-GlcNAc transferase (OGT). Molecular level investigation of the basis for OGT's substrate specificity should aid understanding how O-GlcNAc contributes to diverse cellular processes. Due to an increasing number of O-GlcNAcylated peptides with site-specific information identified by mass spectrometry (MS)-based proteomics, we were motivated to characterize substrate site motifs of O-GlcNAc transferases. In this investigation, a non-redundant dataset of 410 experimentally verified O-GlcNAcylation sites were manually extracted from dbOGAP, OGlycBase and UniProtKB. After detection of conserved motifs by using maximal dependence decomposition, profile hidden Markov model (profile HMM) was adopted to learn a first-layered model for each identified OGT substrate motif. Support Vector Machine (SVM) was then used to generate a second-layered model learned from the output values of profile HMMs in first layer. The two-layered predictive model was evaluated using a five-fold cross validation which yielded a sensitivity of 85.4%, a specificity of 84.1%, and an accuracy of 84.7%. Additionally, an independent testing set from PhosphoSitePlus, which was really non-homologous to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (84.05%) and outperform other O-GlcNAcylation site prediction tools. A case study indicated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation and has been implemented as a web-based system, OGTSite, which is now freely available at .
机译:涉及单个N-乙酰氨基葡萄糖(GlcNAc)与丝氨酸或苏氨酸残基羟基的β-连接的蛋白O-GlcNAcylation是由O-GlcNAc转移酶(OGT)催化的O-连接糖基化。 OGT底物特异性基础的分子水平研究应有助于理解O-GlcNAc如何促进多种细胞过程。由于越来越多的O-GlcNAc酰化肽具有基于质谱(MS)的蛋白质组学鉴定的位点特异性信息,因此我们有动机表征O-GlcNAc转移酶的底物位点基序。在这项调查中,从dbOGAP,OGlycBase和UniProtKB中手动提取了410个经过实验验证的O-GlcNAcylation位点的非冗余数据集。在通过最大依赖分解检测保守的基序之后,采用轮廓隐式马尔可夫模型(轮廓HMM)来学习每个识别出的OGT底物基序的第一层模型。然后,使用支持向量机(SVM)生成第二层模型,该模型是从第一层配置文件HMM的输出值获悉的。使用五重交叉验证对两层预测模型进行评估,得出的灵敏度为85.4%,特异性为84.1%,准确度为84.7%。此外,使用了来自PhosphoSitePlus的独立测试集,该测试集与预测模型的训练数据确实不完全相同,用于证明所提出的方法可以提供有希望的准确性(84.05%),并且胜过其他O-GlcNAcylation站点预测工具。案例研究表明,所提出的方法可能是进行蛋白O-GlcNAcylation初步分析的可行方法,并已作为基于Web的系统OGTSite实施,该系统现已可从以下网站免费获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号