首页> 美国卫生研究院文献>Life >Ensemble of Multiple Classifiers for Multilabel Classification of Plant Protein Subcellular Localization
【2h】

Ensemble of Multiple Classifiers for Multilabel Classification of Plant Protein Subcellular Localization

机译:多型植物蛋白质亚细胞定位的多标签分类的多种分类器的集合

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The accurate prediction of protein localization is a critical step in any functional genome annotation process. This paper proposes an improved strategy for protein subcellular localization prediction in plants based on multiple classifiers, to improve prediction results in terms of both accuracy and reliability. The prediction of plant protein subcellular localization is challenging because the underlying problem is not only a multiclass, but also a multilabel problem. Generally, plant proteins can be found in 10–14 locations/compartments. The number of proteins in some compartments (nucleus, cytoplasm, and mitochondria) is generally much greater than that in other compartments (vacuole, peroxisome, Golgi, and cell wall). Therefore, the problem of imbalanced data usually arises. Therefore, we propose an ensemble machine learning method based on average voting among heterogeneous classifiers. We first extracted various types of features suitable for each type of protein localization to form a total of 479 feature spaces. Then, feature selection methods were used to reduce the dimensions of the features into smaller informative feature subsets. This reduced feature subset was then used to train/build three different individual models. In the process of combining the three distinct classifier models, we used an average voting approach to combine the results of these three different classifiers that we constructed to return the final probability prediction. The method could predict subcellular localizations in both single- and multilabel locations, based on the voting probability. Experimental results indicated that the proposed ensemble method could achieve correct classification with an overall accuracy of 84.58% for 11 compartments, on the basis of the testing dataset.
机译:蛋白质定位的准确预测是任何功能基因组注释过程的关键步骤。本文提出了基于多分类器的植物蛋白质亚细胞定位预测的改进策略,提高了精度和可靠性方面的预测结果。植物蛋白质亚细胞定位的预测是具有挑战性的,因为潜在的问题不仅是多字符,而且是多标签问题。通常,植物蛋白可以在10-14个位置/隔间内找到。一些隔室(核,细胞质和线粒体)中的蛋白质数量通常大于其他隔室(液泡,过氧缺血剂,GOLGI和细胞壁)。因此,通常出现不平衡数据的问题。因此,我们提出了一种基于异构分类器平均投票的集合机器学习方法。我们首先提取适用于每种类型的蛋白质定位的各种类型特征,以形成共479个特征空间。然后,使用特征选择方法来将特征的尺寸减少到较小的信息特征子集中。然后使用该减少的特征子集来培训/构建三种不同的各个模型。在组合三个不同的分类器模型的过程中,我们使用平均投票方法来结合我们构造以返回最终概率预测的这三种不同分类器的结果。该方法可以基于投票概率来预测单个和多拉拉带位置中的亚细胞本地化。实验结果表明,基于测试数据集,所提出的集合方法可以通过114.58%的整体精度实现正确的分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号