首页> 外文期刊>BMC Genomics >Performance of rotation forest ensemble classifier and feature extractor in predicting protein interactions using amino acid sequences
【24h】

Performance of rotation forest ensemble classifier and feature extractor in predicting protein interactions using amino acid sequences

机译:旋转林合成分类器的性能及特征提取器预测蛋白质相互作用的使用氨基酸序列

获取原文
           

摘要

BACKGROUND:There are two significant problems associated with predicting protein-protein interactions using the sequences of amino acids. The first problem is representing each sequence as a feature vector, and the second is designing a model that can identify the protein interactions. Thus, effective feature extraction methods can lead to improved model performance. In this study, we used two types of feature extraction methods-global encoding and pseudo-substitution matrix representation (PseudoSMR)-to represent the sequences of amino acids in human proteins and Human Immunodeficiency Virus type 1 (HIV-1) to address the classification problem of predicting protein-protein interactions. We also compared principal component analysis (PCA) with independent principal component analysis (IPCA) as methods for transforming Rotation Forest.RESULTS:The results show that using global encoding and PseudoSMR as a feature extraction method successfully represents the amino acid sequence for the Rotation Forest classifier with PCA or with IPCA. This can be seen from the comparison of the results of evaluation metrics, which were 73% across the six different parameters. The accuracy of both methods was 74%. The results for the other model performance criteria, such as sensitivity, specificity, precision, and F1-score, were all 73%. The data used in this study can be accessed using the following link: https://www.dsc.ui.ac.id/research/amino-acid-pred/.CONCLUSIONS:Both global encoding and PseudoSMR can successfully represent the sequences of amino acids. Rotation Forest (PCA) performed better than Rotation Forest (IPCA) in terms of predicting protein-protein interactions between HIV-1 and human proteins. Both the Rotation Forest (PCA) classifier and the Rotation Forest IPCA classifier performed better than other classifiers, such as Gradient Boosting, K-Nearest Neighbor, Logistic Regression, Random Forest, and Support Vector Machine (SVM). Rotation Forest (PCA) and Rotation Forest (IPCA) have accuracy, sensitivity, specificity, precision, and F1-score values 70% while the other classifiers have values 70%.
机译:背景:使用氨基酸序列预测蛋白质 - 蛋白质相互作用存在两种显着的问题。第一问题是将每个序列表示为特征向量,第二个问题是设计可以识别蛋白质相互作用的模型。因此,有效的特征提取方法可以提高模型性能。在本研究中,我们使用了两种类型的特征提取方法 - 全局编码和伪替代矩阵表示(Pseudosmr) - 代表人蛋白和人免疫缺陷病毒类型1(HIV-1)中的氨基酸序列以解决分类预测蛋白质 - 蛋白质相互作用的问题。我们还将主成分分析(PCA)与独立的主成分分析(IPCA)进行了比较为转换旋转林的方法。结果表明,使用全局编码和假效应作为特征提取方法成功代表旋转森林的氨基酸序列分类器与PCA或IPCA。从评估度量结果的比较可以看出,遍布六种不同参数的> 73%。两种方法的准确性> 74%。其他模型性能标准的结果,如灵敏度,特异性,精度和F1分数,均为73%。可以使用以下链接访问本研究中使用的数据:https://www.dsc.ui.ac.id/research/amino-acid-pred/.conclusions:遍布全局编码和pseudosmr可以成功代表序列氨基酸。在预测HIV-1和人蛋白之间的蛋白质 - 蛋白质相互作用方面,旋转森林(PCA)比旋转森林(IPCA)更好。旋转森林(PCA)分类器和旋转林IPCA分类器比其他分类器更好,例如渐变升压,k最近邻,逻辑回归,随机林和支持向量机(SVM)。旋转森林(PCA)和旋转森林(IPCA)具有准确性,灵敏度,特异性,精度和F1分数值> 70%,而其他分类器具有值<70%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号