首页> 外文期刊>Proteins: Structure, Function, and Genetics >Enabling full‐length evolutionary profiles based deep convolutional neural network for predicting DNA‐binding proteins from sequence
【24h】

Enabling full‐length evolutionary profiles based deep convolutional neural network for predicting DNA‐binding proteins from sequence

机译:从序列中启用基于深卷积神经网络的全长进化型材基于深度卷积神经网络,从而预测DNA结合蛋白

获取原文
获取原文并翻译 | 示例
       

摘要

Abstract Sequence based DNA‐binding protein (DBP) prediction is a widely studied biological problem. Sliding windows on position specific substitution matrices (PSSMs) rows predict DNA‐binding residues well on known DBPs but the same models cannot be applied to unequally sized protein sequences. PSSM summaries representing column averages and their amino‐acid wise versions have been effectively used for the task, but it remains unclear if these features carry all the PSSM's predictive power, traditionally harnessed for binding site predictions. Here we evaluate if PSSMs scaled up to a fixed size by zero‐vector padding (pPSSM) could perform better than the summary based features on similar models. Using multilayer perceptron (MLP) and deep convolutional neural network (CNN), we found that (a) Summary features work well for single‐genome (human‐only) data but are outperformed by pPSSM for diverse PDB‐derived data sets, suggesting greater summary‐level redundancy in the former, (b) even when summary features work comparably well with pPSSM, a consensus on the two outperforms both of them (c) CNN models comprehensively outperform their corresponding MLP models and (d) actual predicted scores from different models depend on the choice of input feature sets used whereas overall performance levels are model‐dependent in which CNN leads the accuracy.
机译:摘要基于序列的DNA结合蛋白(DBP)预测是一种广泛研究的生物学问题。在位置特定替代矩阵(PSSMS)行上滑动窗口在已知的DBPS上预测DNA结合残留物,但不能应用于不平等大小的蛋白质序列的相同模型。代表柱平均值及其氨基酸WIES版本的PSSM摘要已经有效地用于任务,但如果这些功能携带所有PSSM的预测力,则仍然不明确,传统上利用绑定站点预测。在这里,我们评估PSSMS通过零矢量填充(PPSSM)缩放到固定大小(PPSSM),可以比基于摘要的特征更好地执行类似模型。使用Multidayer Perceptron(MLP)和深卷积神经网络(CNN),我们发现(a)摘要功能适用于单基因组(仅限人类)数据,但由于PPSSM而言,对于不同的PDB派生数据集,表明更大前者中的摘要级冗余,(b)即使摘要功能与PPSSM相对良好,两个优势与它们的两种优势共识模型取决于所使用的输入功能集的选择,而整体性能水平是模型相关的,其中CNN引起精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号