首页> 外文期刊>BMC Bioinformatics >Predicting antifreeze proteins with weighted generalized dipeptide composition and multi-regression feature selection ensemble
【24h】

Predicting antifreeze proteins with weighted generalized dipeptide composition and multi-regression feature selection ensemble

机译:预测具有加权广义二肽组合物的防冻蛋白和多竞回归特征选择合奏

获取原文
       

摘要

Antifreeze proteins (AFPs) are a group of proteins that inhibit body fluids from growing to ice crystals and thus improve biological antifreeze ability. It is vital to the survival of living organisms in extremely cold environments. However, little research is performed on sequences feature extraction and selection for antifreeze proteins classification in the structure and function prediction, which is of great significance. In this paper, to predict the antifreeze proteins, a feature representation of weighted generalized dipeptide composition (W-GDipC) and an ensemble feature selection based on two-stage and multi-regression method (LRMR-Ri) are proposed. Specifically, four feature selection algorithms: Lasso regression, Ridge regression, Maximal information coefficient and Relief are used to select the feature sets, respectively, which is the first stage of LRMR-Ri method. If there exists a common feature subset among the above four sets, it is the optimal subset; otherwise we use Ridge regression to select the optimal subset from the public set pooled by the four sets, which is the second stage of LRMR-Ri. The LRMR-Ri method combined with W-GDipC was performed both on the antifreeze proteins dataset (binary classification), and on the membrane protein dataset (multiple classification). Experimental results show that this method has good performance in support vector machine (SVM), decision tree (DT) and stochastic gradient descent (SGD). The values of ACC, RE and MCC of LRMR-Ri and W-GDipC with antifreeze proteins dataset and SVM classifier have reached as high as 95.56%, 97.06% and 0.9105, respectively, much higher than those of each single method: Lasso, Ridge, Mic and Relief, nearly 13% higher than single Lasso for ACC. The experimental results show that the proposed LRMR-Ri and W-GDipC method can significantly improve the accuracy of antifreeze proteins prediction compared with other similar single feature methods. In addition, our method has also achieved good results in the classification and prediction of membrane proteins, which verifies its widely reliability to a certain extent.
机译:防冻蛋白(AFP)是一组蛋白质,其抑制体液从生长到冰晶,从而提高生物防冻能力。它对极冷环境中生物体的生存至关重要。然而,对结构和功能预测中的防冻蛋白分类的序列特征提取和选择进行了很少的研究,这具有重要意义。在本文中,为了预测防冻蛋白,提出了基于两阶段和多回归方法(LRMR-RI)的加权广义二肽组合物(W-GDIPC)的特征表示和基于两阶段和多级回归方法(LRMR-RI)的特征表示。具体而言,使用四个特征选择算法:套索回归,脊回归,最大信息系数和浮雕,分别选择特征集,即LRMR-RI方法的第一级。如果在上述四组之间存在通用特征子集,则它是最佳子集;否则,我们使用ridge回归来选择由四组汇集的公共集中的最佳子集,这是LRMR-RI的第二阶段。与W-GDIPC组合的LRMR-RI方法在防冻蛋白数据集(二进制分类)上以及膜蛋白数据集(多分类)上进行。实验结果表明,该方法在支持向量机(SVM),决策树(DT)和随机梯度下降(SGD)方面具有良好的性能。 LRMR-RI和W-GDIPC的ACC,Re和MCC的值分别达到95.56%,97.06%和0.9105的高达95.56%和0.9105,远高于每种方法:套索,山脊,麦克风和救济,比单个套索高近13%,适用于ACC。实验结果表明,与其他类似的单一特征方法相比,所提出的LRMR-RI和W-GDIPC方法可以显着提高防冻蛋白预测的准确性。此外,我们的方法也在膜蛋白的分类和预测中取得了良好的效果,这在一定程度上验证了其广泛的可靠性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号