首页> 外文OA文献 >Ensemble random projection for multi-label classification with application to protein subcellular localization
【2h】

Ensemble random projection for multi-label classification with application to protein subcellular localization

机译:集合随机投影技术进行多标签分类,并应用于蛋白质亚细胞定位

摘要

The curse of dimensionality severely restricts the predictive power of multi-label classification systems. High-dimensional feature vectors may contain redundant or irrelevant information, causing the classification systems suffer from overfitting. To address this problem, this paper proposes a dimensionality-reduction method that applies random projection (RP) to construct an ensemble of multilabel classifiers. The merits of the proposed method are demonstrated through a multi-label protein classification task. Specifically, high-dimensional feature vectors are extracted from protein sequences using the gene ontology (GO) and Swiss-Prot databases. The feature vectors are then projected onto lower-dimensional spaces by random projection matrices whose elements conform to a distribution with zero mean and unit variance. The transformed low-dimensional vectors are classified by an ensemble of one-vs-rest multi-label support vector machine (SVM) classifiers, each corresponding to one of the RP matrices. The scores obtained from the ensemble are then fused for predicting the subcellular localization of proteins. Experimental results suggest that the proposed method can reduce the dimensions by seven folds and impressively improve the classification performance.
机译:维数的诅咒严重限制了多标签分类系统的预测能力。高维特征向量可能包含冗余或不相关的信息,从而导致分类系统遭受过度拟合的困扰。为了解决这个问题,本文提出了一种降维方法,该方法应用随机投影(RP)构造多标签分类器的集合。通过多标记蛋白质分类任务证明了该方法的优点。具体而言,使用基因本体论(GO)和Swiss-Prot数据库从蛋白质序列中提取高维特征向量。然后,通过随机投影矩阵将特征向量投影到低维空间上,该矩阵的元素符合具有零均值和单位方差的分布。变换后的低维向量由一对一的多标签支持向量机(SVM)分类器分类,每个分类器对应于一个RP矩阵。然后融合从合奏获得的分数,以预测蛋白质的亚细胞定位。实验结果表明,该方法可以将尺寸减少7倍,并显着提高分类性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号