首页> 外文期刊>Bioinformatics >Motifs tree: a new method for predicting post-translational modifications
【24h】

Motifs tree: a new method for predicting post-translational modifications

机译:主题树:一种预测翻译后修饰的新方法

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Post-translational modifications (PTMs) are important steps in the maturation of proteins. Several models exist to predict specific PTMs, from manually detected patterns to machine learning methods. On one hand, the manual detection of patterns does not provide the most efficient classifiers and requires an important workload, and on the other hand, models built by machine learning methods are hard to interpret and do not increase biological knowledge. Therefore, we developed a novel method based on patterns discovery and decision trees to predict PTMs. The proposed algorithm builds a decision tree, by coupling the C4.5 algorithm with genetic algorithms, producing high-performance white box classifiers. Our method was tested on the initiator methionine cleavage (IMC) and N-alpha-terminal acetylation (N-Ac), two of the most common PTMs. Results: The resulting classifiers perform well when compared with existing models. On a set of eukaryotic proteins, they display a cross-validated Matthews correlation coefficient of 0.83 (IMC) and 0.65 (N-Ac). When used to predict potential substrates of N-terminal acetyltransferaseB and N-terminal acetyltransferaseC, our classifiers display better performance than the state of the art. Moreover, we present an analysis of the model predicting IMC for Homo sapiens proteins and demonstrate that we are able to extract experimentally known facts without prior knowledge. Those results validate the fact that our method produces white box models
机译:动机:翻译后修饰(PTM)是蛋白质成熟的重要步骤。从手动检测的模式到机器学习方法,存在几种模型来预测特定的PTM。一方面,模式的手动检测无法提供最有效的分类器,并且需要大量的工作量,另一方面,通过机器学习方法构建的模型很难解释,并且不会增加生物学知识。因此,我们开发了一种基于模式发现和决策树的新型方法来预测PTM。所提出的算法通过将C4.5算法与遗传算法相结合来构建决策树,从而生成高性能的白盒分类器。我们对引发剂蛋氨酸裂解(IMC)和N-α-末端乙酰化(N-Ac)(两种最常见的PTM)进行了测试。结果:与现有模型相比,所得分类器表现良好。在一组真核蛋白上,它们显示出交叉验证的马修斯相关系数为0.83(IMC)和0.65(N-Ac)。当用于预测N末端乙酰基转移酶B和N末端乙酰基转移酶C的潜在底物时,我们的分类器显示出比现有技术更好的性能。此外,我们对智人蛋白质的IMC预测模型进行了分析,并证明了我们能够在没有先验知识的情况下提取实验已知的事实。这些结果证实了我们的方法产生白盒模型的事实

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号