...
首页> 外文期刊>BMC Bioinformatics >Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences
【24h】

Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences

机译:基于蛋白质序列的单链和双链DNA结合蛋白的分析和预测

获取原文

摘要

Background DNA-binding proteins perform important functions in a great number of biological activities. DNA-binding proteins can interact with ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA), and DNA-binding proteins can be categorized as single-stranded DNA-binding proteins (SSBs) and double-stranded DNA-binding proteins (DSBs). The identification of DNA-binding proteins from amino acid sequences can help to annotate protein functions and understand the binding specificity. In this study, we systematically consider a variety of schemes to represent protein sequences: OAAC (overall amino acid composition) features, dipeptide compositions, PSSM (position-specific scoring matrix profiles) and split amino acid composition (SAA), and then we adopt SVM (support vector machine) and RF (random forest) classification model to distinguish SSBs from DSBs. Results Our results suggest that some sequence features can significantly differentiate DSBs and SSBs. Evaluated by 10 fold cross-validation on the benchmark datasets, our prediction method can achieve the accuracy of 88.7% and AUC (area under the curve) of 0.919. Moreover, our method has good performance in independent testing. Conclusions Using various sequence-derived features, a novel method is proposed to distinguish DSBs and SSBs accurately. The method also explores novel features, which could be helpful to discover the binding specificity of DNA-binding proteins.
机译:背景DNA结合蛋白在许多生物学活动中起重要作用。 DNA结合蛋白可以与ssDNA(单链DNA)或dsDNA(双链DNA)相互作用,并且DNA结合蛋白可以分为单链DNA结合蛋白(SSB)和双链DNA结合蛋白(DSB)。从氨基酸序列鉴定DNA结合蛋白可以帮助注释蛋白功能并了解结合特异性。在这项研究中,我们系统地考虑了多种表示蛋白质序列的方案:OAAC(总氨基酸组成)特征,二肽组成,PSSM(位置特异性得分矩阵图)和拆分氨基酸组成(SAA),然后我们采用SVM(支持向量机)和RF(随机森林)分类模型,用于区分SSB和DSB。结果我们的结果表明某些序列特征可以显着区分DSB和SSB。通过对基准数据集进行10倍交叉验证评估,我们的预测方法可以达到88.7%的准确性和0.919的AUC(曲线下面积)。而且,我们的方法在独立测试中具有良好的性能。结论利用各种序列衍生特征,提出了一种新方法来准确区分DSB和SSB。该方法还探索了新颖的功能,可能有助于发现DNA结合蛋白的结合特异性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号