首页> 外文会议>IEEE Region 10 Humanitarian Technology Conference >predMultiLoc-Gneg: Predicting subcellular localization of gram-negative bacterial proteins using feature selection in gene ontology space and resolving the data imbalance issue
【24h】

predMultiLoc-Gneg: Predicting subcellular localization of gram-negative bacterial proteins using feature selection in gene ontology space and resolving the data imbalance issue

机译:preduptiloc-gneg:使用基因本体空间中的特征选择预测革兰阴性细菌蛋白的亚细胞定位,并解决数据不平衡问题

获取原文

摘要

Several types of subcellular localization prediction methods have been proposed depending on various classification methods which produce different levels of accuracy. Most of these predictors aim to find the optimal classifier and very few of them consider the issue of simplifying the complexity of biological system. However, there are two important issues that can take place to simplify the complexity of prediction system before developing successful predictor: handling high dimension feature, and overcoming the challenge of large data imbalance in the training data. As a result, in this work, an efficient computational tool named predMultiLoc-Gneg has been constructed to predict the multi-label subcellular localization of gram-negative bacterial proteins by (1) selecting relevant GO (Gene Ontology) terms in order to create a GO subspace which reduces feature dimension in contrast to the whole GO space and extracting GO-based features for a protein by considering only these relevant GO terms, (2) developing a multi-label predictor using support vector machine (SVM) with resolving data imbalance issue. The experimental results demonstrate that the predMultiLoc-Gneg provides remarkably better performance in predicting protein subcellular localization for gram-negative bacterial dataset than the existing top predictors. The web server for the predMultiLoc-Gneg is available at http://research.ru.ac.bd/predMultiLoc-Gneg/.
机译:根据各种分类方法提出了几种类型的亚细胞定位预测方法,其产生不同程度的精度。这些预测的大多数旨在找到最佳分类器,很少有人考虑简化生物系统复杂性的问题。然而,有两个重要的问题可以在开发成功的预测仪之前简化预测系统的复杂性:处理高维特征,克服训练数据中大数据不平衡的挑战。其结果是,在这项工作中,命名为predMultiLoc-Gneg一种高效的计算工具已被构造成预测由(1)选择相关的GO(基因本体论)术语革兰氏阴性细菌蛋白的多标签的亚细胞定位,以便创建一个GO子空间,与整个GO空间相比,将特征维度降低,并通过仅考虑使用支持向量机(SVM)开发多标签预测仪,通过解决数据不平衡来开发多标签预测器来提取蛋白质的基于GO的特征问题。实验结果表明,Preduploc-Gneg在预测革兰氏阴性细菌数据集的预测蛋白亚细菌定位方面提供了比现有的最高预测因子的显着更好的性能。 preduptiloc-gneg的Web服务器可在http://research.ru.ac.bd/predmultiloc-gneg/获取。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号