首页> 外文期刊>BMC Systems Biology >Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation
【24h】

Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation

机译:通过结合支持向量机和PSSM距离转换识别DNA结合蛋白

获取原文
           

摘要

Background DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation. There have been several computational methods proposed in the literature to deal with the DNA-binding protein identification. However, most of them can't provide an invaluable knowledge base for our understanding of DNA-protein interactions. Results We firstly presented a new protein sequence encoding method called PSSM Distance Transformation, and then constructed a DNA-binding protein identification method (SVM-PSSM-DT) by combining PSSM Distance Transformation with support vector machine (SVM). First, the PSSM profiles are generated by using the PSI-BLAST program to search the non-redundant (NR) database. Next, the PSSM profiles are transformed into uniform numeric representations appropriately by distance transformation scheme. Lastly, the resulting uniform numeric representations are inputted into a SVM classifier for prediction. Thus whether a sequence can bind to DNA or not can be determined. In benchmark test on 525 DNA-binding and 550 non DNA-binding proteins using jackknife validation, the present model achieved an ACC of 79.96%, MCC of 0.622 and AUC of 86.50%. This performance is considerably better than most of the existing state-of-the-art predictive methods. When tested on a recently constructed independent dataset PDB186, SVM-PSSM-DT also achieved the best performance with ACC of 80.00%, MCC of 0.647 and AUC of 87.40%, and outperformed some existing state-of-the-art methods. Conclusions The experiment results demonstrate that PSSM Distance Transformation is an available protein sequence encoding method and SVM-PSSM-DT is a useful tool for identifying the DNA-binding proteins. A user-friendly web-server of SVM-PSSM-DT was constructed, which is freely accessible to the public at the web-site on http://?bioinformatics.?hitsz.?edu.?cn/?PSSM-DT/? .
机译:背景DNA结合蛋白在从DNA复制到基因表达控制的各种细胞内和细胞外活动中起着关键作用。 DNA结合蛋白的鉴定是基因组注释领域的主要挑战之一。文献中提出了几种计算方法来处理DNA结合蛋白的鉴定。但是,它们中的大多数不能为我们对DNA-蛋白质相互作用的理解提供宝贵的知识基础。结果我们首先提出了一种新的蛋白质序列编码方法,称为PSSM距离变换,然后通过结合PSSM距离变换和支持向量机(SVM)构造了DNA结合蛋白鉴定方法(SVM-PSSM-DT)。首先,通过使用PSI-BLAST程序搜索非冗余(NR)数据库来生成PSSM配置文件。接下来,通过距离转换方案将PSSM轮廓适当地转换为统一的数字表示形式。最后,将得到的统一数值表示形式输入到SVM分类器中进行预测。因此,可以确定序列是否可以结合DNA。在使用折刀验证对525个DNA结合蛋白和550个非DNA结合蛋白进行的基准测试中,本模型的ACC为79.96%,MCC为0.622,AUC为86.50%。该性能大大优于大多数现有的最新技术。在新近构建的独立数据集PDB186上进行测试时,SVM-PSSM-DT还以80.00%的ACC,0.647的MCC和87.40%的AUC达到了最佳性能,并且优于某些现有的最新方法。结论实验结果表明PSSM距离转换是一种可用的蛋白质序列编码方法,而SVM-PSSM-DT是鉴定DNA结合蛋白的有用工具。构建了一个用户友好的SVM-PSSM-DT网络服务器,公众可以在http://?bioinformatics。?hitsz。?edu。?cn /?PSSM-DT /上的网站上免费使用? 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号