首页> 美国卫生研究院文献>PLoS Clinical Trials >DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues
【2h】

DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues

机译:DNABP:基于特征选择使用随机森林和预测结合残基的DNA结合蛋白的鉴定。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

DNA-binding proteins are fundamentally important in cellular processes. Several computational-based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA-binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to predict DNA-binding proteins using the random forest (RF) classifier with a hybrid feature. The hybrid feature contains two types of novel sequence features, which reflect information about the conservation of physicochemical properties of the amino acids, and the binding propensity of DNA-binding residues and non-binding propensities of non-binding residues. The comparisons with each feature demonstrated that these two novel features contributed most to the improvement in predictive ability. Furthermore, to improve the prediction performance of the DNABP model, feature selection using the minimum redundancy maximum relevance (mRMR) method combined with incremental feature selection (IFS) was carried out during the model construction. The results showed that the DNABP model could achieve 86.90% accuracy, 83.76% sensitivity, 90.03% specificity and a Matthews correlation coefficient of 0.727. High prediction accuracy and performance comparisons with previous research suggested that DNABP could be a useful approach to identify DNA-binding proteins from sequence information. The DNABP web server system is freely available at .
机译:DNA结合蛋白在细胞过程中至关重要。近年来,已经开发了几种基于计算的方法来改进对DNA结合蛋白的预测。但是,从蛋白质序列信息预测DNA结合蛋白方面的工作尚未完成。在本文中,一种新型的预测因子DNABP(DNA结合蛋白)被设计为使用具有杂种特征的随机森林(RF)分类器来预测DNA结合蛋白。杂合特征包含两种类型的新颖序列特征,其反映了有关氨基酸的物理化学性质的保守性,DNA结合残基的结合倾向和非结合残基的非结合倾向的信息。与每个功能的比较表明,这两个新颖的功能对预测能力的提高贡献最大。此外,为了提高DNABP模型的预测性能,在模型构建过程中使用最小冗余最大相关性(mRMR)方法与增量特征选择(IFS)相结合进行了特征选择。结果表明,该DNABP模型可达到86.90%的准确度,83.76%的灵敏度,90.03%的特异性和0.727的马修斯相关系数。与以前的研究相比,较高的预测准确性和性能比较表明DNABP可能是从序列信息中鉴定DNA结合蛋白的有用方法。 DNABP Web服务器系统可从以下网站免费获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号