首页> 外文期刊>Genes >A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers
【24h】

A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers

机译:通过编排多视图特征和分类器识别DNA结合蛋白的模型堆叠框架

获取原文
       

摘要

Nowadays, various machine learning-based approaches using sequence information alone have been proposed for identifying DNA-binding proteins, which are crucial to many cellular processes, such as DNA replication, DNA repair and DNA modification. Among these methods, building a meaningful feature representation of the sequences and choosing an appropriate classifier are the most trivial tasks. Disclosing the significances and contributions of different feature spaces and classifiers to the final prediction is of the utmost importance, not only for the prediction performances, but also the practical clues of biological experiment designs. In this study, we propose a model stacking framework by orchestrating multi-view features and classifiers (MSFBinder) to investigate how to integrate and evaluate loosely-coupled models for predicting DNA-binding proteins. The framework integrates multi-view features including Local_DPP, 188D, Position-Specific Scoring Matrix (PSSM)_DWT and autocross-covariance of secondary structures(AC_Struc), which were extracted based on evolutionary information, sequence composition, physiochemical properties and predicted structural information, respectively. These features are fed into various loosely-coupled classifiers such as SVM and random forest. Then, a logistic regression model was applied to evaluate the contributions of these individual classifiers and to make the final prediction. When performing on the training dataset PDB1075, the proposed method achieves an accuracy of 83.53%. On the independent dataset PDB186, the method achieves an accuracy of 81.72%, which outperforms many existing methods. These results suggest that the framework is able to orchestrate various predicted models flexibly with good performances.
机译:如今,已提出了各种仅使用序列信息的基于机器学习的方法来鉴定DNA结合蛋白,这对于许多细胞过程(例如DNA复制,DNA修复和DNA修饰)至关重要。在这些方法中,构建序列的有意义的特征表示并选择适当的分类器是最琐碎的任务。揭示不同特征空间和分类器对最终预测的意义和贡献是至关重要的,这不仅对于预测性能,而且对于生物学实验设计的实用线索都是至关重要的。在这项研究中,我们通过编排多视图特征和分类器(MSFBinder)提出了一个模型堆叠框架,以研究如何集成和评估用于预测DNA结合蛋白的松散耦合模型。该框架集成了多视图功能,包括Local_DPP,188D,特定位置评分矩阵(PSSM)_DWT和二级结构的自交叉协方差(AC_Struc),这些功能是根据进化信息,序列组成,理化特性和预测的结构信息提取的,分别。这些功能被输入到各种松散耦合的分类器中,例如SVM和随机森林。然后,采用逻辑回归模型评估这些单独分类器的贡献并做出最终预测。在训练数据集PDB1075上执行时,所提方法达到了83.53%的精度。在独立数据集PDB186上,该方法的准确度达到81.72%,优于许多现有方法。这些结果表明,该框架能够以良好的性能灵活地协调各种预测模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号