...
首页> 外文期刊>BMC Bioinformatics >A simplified approach to disulfide connectivity prediction from protein sequences
【24h】

A simplified approach to disulfide connectivity prediction from protein sequences

机译:从蛋白质序列预测二硫键连接性的简化方法

获取原文

摘要

Background Prediction of disulfide bridges from protein sequences is useful for characterizing structural and functional properties of proteins. Several methods based on different machine learning algorithms have been applied to solve this problem and public domain prediction services exist. These methods are however still potentially subject to significant improvements both in terms of prediction accuracy and overall architectural complexity. Results We introduce new methods for predicting disulfide bridges from protein sequences. The methods take advantage of two new decomposition kernels for measuring the similarity between protein sequences according to the amino acid environments around cysteines. Disulfide connectivity is predicted in two passes. First, a binary classifier is trained to predict whether a given protein chain has at least one intra-chain disulfide bridge. Second, a multiclass classifier (plemented by 1-nearest neighbor) is trained to predict connectivity patterns. The two passes can be easily cascaded to obtain connectivity prediction from sequence alone. We report an extensive experimental comparison on several data sets that have been previously employed in the literature to assess the accuracy of cysteine bonding state and disulfide connectivity predictors. Conclusion We reach state-of-the-art results on bonding state prediction with a simple method that classifies chains rather than individual residues. The prediction accuracy reached by our connectivity prediction method compares favorably with respect to all but the most complex other approaches. On the other hand, our method does not need any model selection or hyperparameter tuning, a property that makes it less prone to overfitting and prediction accuracy overestimation.
机译:背景技术从蛋白质序列预测二硫键对于表征蛋白质的结构和功能特性是有用的。已经应用了几种基于不同机器学习算法的方法来解决此问题,并且存在公共领域预测服务。但是,这些方法仍可能在预测准确性和总体体系结构复杂性方面进行重大改进。结果我们介绍了从蛋白质序列预测二硫键的新方法。该方法利用了两个新的分解核心,可根据半胱氨酸周围的氨基酸环境测量蛋白质序列之间的相似性。预测有两步通过二硫键。首先,训练二元分类器以预测给定的蛋白质链是否具有至少一个链内二硫键。第二,训练多类分类器(由最近的1个邻居补充)以预测连通性模式。可以轻松地将这两个遍级联起来,以获得仅来自序列的连通性预测。我们报告了广泛的实验比较,这些数据集先前已在文献中用于评估半胱氨酸键合状态和二硫键连通性预测因子的准确性。结论我们使用一种对链进行分类而不是对单个残基进行分类的简单方法,获得了有关键合状态预测的最新结果。我们的连通性预测方法所达到的预测精度相对于除最复杂的其他方法之外的所有方法都具有优势。另一方面,我们的方法不需要任何模型选择或超参数调整,该属性使其不易于过度拟合且预测精度过高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号