首页> 外文期刊>BioMed research international >Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection
【24h】

Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection

机译:基于序列的RNA结合蛋白的预测使用随机林,最小冗余最大相关特征选择

获取原文
获取原文并翻译 | 示例
           

摘要

The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.
机译:RNA结合蛋白的预测是计算生物学中最具挑战性的问题之一。虽然有些研究已经调查了这个问题,但预测的准确性仍然不够。在该研究中,开发了一种高精度的方法,以使用具有最小冗余最大相关性(MRMR)方法的随机林从氨基酸序列中预测来自氨基酸序列的RNA结合蛋白,其次是增量特征选择(IFS)。我们注册了联合三合会特征的特点和三个新颖特征:结合倾向(BP),非粘接倾向(NBP)和与物理化学特性(EIPP)相结合的进化信息。结果表明,这些新颖特征在提高预测因素的性能方面具有重要作用。使用MRMR-IFS方法,我们的预测仪实现了最佳性能(精度为86.62%和0.737 Matthews相关系数)。高预测精度和成功的预测性能表明我们的方法可以是从序列信息中鉴定RNA结合蛋白的有用方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号