首页> 外文期刊>Biotechnology & Biotechnological Equipment >A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methods
【24h】

A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methods

机译:一种新的方法,用于使用基于序列模板的方法的组合预测蛋白质中的RNA相互作用残留物

获取原文

摘要

RNA-binding proteins (RBPs) play a significant role in many cellular processes and regulation of gene expression, therefore, accurately identifying the RNA-interacting residues in protein sequences is crucial to detect the structure of RBPs and infer their function for new drug design. The protein sequence as basic information has been widely used in many protein researches with the combination of machine learning techniques. Here, we propose a sequence-based method to predict the RNA-protein interacting residues in protein sequences. The prediction method is composed of two predictors including a feature-based predictor and a sequence template-based predictor. The feature-based predictor applies the random forest (RF) classifier with the protein sequence information. After getting the classification probability, an adjustment procedure is used in consideration of neighbouring correlation between RNA interacting residues. The sequence template-based predictor selects the optimal template of the query sequence by multiple sequence alignment and matches the interacting residues in template sequence into the query sequence. With the combination of two predictors, the coverage and prediction performance of our method have been greatly improved, the MCC value increases from 0.467 and 0.352 to 0.499 in our validation set. In order to evaluate our proposed method, an independent testing set is utilized to compare with other two hybrid methods. As a result, our method achieves better performance than the other two methods with an overall accuracy of 0.817, an MCC value of 0.511 and an F-score of 0.605, which demonstrates that our method can reliably predict the RNA interacting residues in protein sequences. Moreover, the effectiveness of our newly proposed adjustment procedure in the feature-based predictor is examined and analyzed in detail.
机译:RNA结合蛋白(RBP)在许多细胞过程中发挥着重要作用和基因表达的调节,因此,准确地鉴定蛋白质序列中的RNA相互作用残留是至关重要的,以检测RBP的结构并推断出新的药物设计的功能。作为基本信息的蛋白质序列已被广泛应用于许多蛋白质研究,这些蛋白质研究与机器学习技术的组合。这里,我们提出了一种基于序列的方法来预测蛋白质序列中的RNA-蛋白质相互作用残留物。预测方法由两个预测器组成,包括基于特征的预测器和基于序列模板的预测器。特征的预测器将随机林(RF)分类器应用于蛋白质序列信息。在获得分类概率之后,考虑到RNA相互作用残留物之间的相邻相关性来使用调整过程。基于序列模板的预测器通过多个序列对齐选择查询序列的最佳模板,并将模板序列中的交互残差与查询序列匹配。随着两个预测器的组合,我们的方法的覆盖率和预测性能得到了大大提高,MCC值在我们的验证集中从0.467和0.352增加到0.499。为了评估我们所提出的方法,利用独立的测试集与其他两个混合方法进行比较。结果,我们的方法比0.817的整体精度的其他两种方法实现了更好的性能,MCC值为0.511和0.605的F分数,表明我们的方法可以可靠地预测蛋白质序列中的RNA相互作用残留物。此外,在基于特征的预测器中进行了新提出的调整过程的有效性被检查并详细分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号