...
首页> 外文期刊>BioSystems >ProLoc: Prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features
【24h】

ProLoc: Prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features

机译:ProLoc:使用SVM预测蛋白亚核的位置,并从理化组成特征中自动选择

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Accurate prediction methods of protein subnuclear localizations rely on the cooperation between informative features and classifier design. Support vector machine (SVM) based learning methods are shown effective for predictions of protein subcellular and subnuclear localizations. This study proposes an evolutionary support vector machine (ESVM) based classifier with automatic selection from a large set of physicochemical composition (PCC) features to design an accurate system for predicting protein subnuclear localization, named ProLoc. ESVM using an inheritable genetic algorithm combined with SVM can automatically determine the best number m of PCC features and identify m out of 526 PCC features simultaneously. To evaluate ESVM, this study uses two datasets SNL6 and SNL9, which have 504 proteins localized in 6 subnuclear compartments and 370 proteins localized in 9 subnuclear compartments. Using a leave-one-out cross-validation, ProLoc utilizing the selected m = 33 and 28 PCC features has accuracies of 56.37% for SNL6 and 72.82% for SNL9, which are better than 51.4% for the SVM-based system using k-peptide composition features applied on SNL6, and 64.32% for an optimized evidence-theoretic k-nearest neighbor classifier utilizing pseudo amino acid composition applied on SNL9, respectively.
机译:蛋白质亚核定位的准确预测方法依赖于信息特征和分类器设计之间的协作。已显示基于支持向量机(SVM)的学习方法可有效预测蛋白质亚细胞和亚核的位置。这项研究提出了一种基于进化支持向量机(ESVM)的分类器,该分类器具有从大量理化成分(PCC)特征中自动选择的功能,以设计一种精确的系统来预测蛋白质亚核定位,称为ProLoc。使用可遗传算法结合SVM的ESVM可以自动确定最佳数量的PCC特征,并同时识别526个PCC特征中的m个。为了评估ESVM,本研究使用两个数据集SNL6和SNL9,它们具有位于6个亚核区室中的504个蛋白和位于9个亚核区室中的370个蛋白。使用留一法交叉验证,ProLoc利用选定的m = 33和28个PCC功能,对SNL6的准确度为56.37%,对SNL9的准确度为72.82%,优于使用k-based的基于SVM的系统的51.4%。肽组成特征适用于SNL6,64.32%分别用于应用SNL9的拟氨基酸组成的优化的证据理论k最近邻分类器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号