首页> 外文期刊>Pattern Analysis and Machine Intelligence, IEEE Transactions on >The Fisher-Markov Selector: Fast Selecting Maximally Separable Feature Subset for Multiclass Classification with Applications to High-Dimensional Data
【24h】

The Fisher-Markov Selector: Fast Selecting Maximally Separable Feature Subset for Multiclass Classification with Applications to High-Dimensional Data

机译:Fisher-Markov选择器:用于多类分类的快速选择最大可分离特征子集,并应用于高维数据

获取原文
获取原文并翻译 | 示例
       

摘要

Selecting features for multiclass classification is a critically important task for pattern recognition and machine learning applications. Especially challenging is selecting an optimal subset of features from high-dimensional data, which typically have many more variables than observations and contain significant noise, missing components, or outliers. Existing methods either cannot handle high-dimensional data efficiently or scalably, or can only obtain local optimum instead of global optimum. Toward the selection of the globally optimal subset of features efficiently, we introduce a new selectorȁ4;which we call the Fisher-Markov selectorȁ4;to identify those features that are the most useful in describing essential differences among the possible groups. In particular, in this paper we present a way to represent essential discriminating characteristics together with the sparsity as an optimization objective. With properly identified measures for the sparseness and discriminativeness in possibly high-dimensional settings, we take a systematic approach for optimizing the measures to choose the best feature subset. We use Markov random field optimization techniques to solve the formulated objective functions for simultaneous feature selection. Our results are noncombinatorial, and they can achieve the exact global optimum of the objective function for some special kernels. The method is fast; in particular, it can be linear in the number of features and quadratic in the number of observations. We apply our procedure to a variety of real-world data, including mid--dimensional optical handwritten digit data set and high-dimensional microarray gene expression data sets. The effectiveness of our method is confirmed by experimental results. In pattern recognition and from a model selection viewpoint, our procedure says that it is possible to select the most discriminating subset of variables by solving a very simple unconstrained objective function which in f act can be obtained with an explicit expression.
机译:选择用于多类分类的特征对于模式识别和机器学习应用而言是至关重要的任务。尤其具有挑战性的是从高维数据中选择特征的最佳子集,该特征子集通常比观测值具有更多的变量,并且包含明显的噪声,缺失的分量或离群值。现有方法要么不能有效或可扩展地处理高维数据,要么只能获得局部最优而不是全局最优。为了有效地选择特征的全局最优子集,我们引入了一个新的选择器ȁ4;我们将其称为Fisher-Markov选择器ȁ4;以识别那些对描述可能的组之间的本质差异最有用的特征。特别是,在本文中,我们提出了一种以稀疏性为优化目标来表示基本区分特征的方法。通过适当识别可能的高维设置中的稀疏性和判别力的度量,我们采用了系统的方法来优化度量以选择最佳特征子集。我们使用马尔可夫随机场优化技术来解决制定的目标函数,以便同时进行特征选择。我们的结果是非组合的,对于某些特殊的内核,它们可以实现目标函数的精确全局最优。该方法速度快;特别是,特征数量可以是线性的,而观察数量可以是二次的。我们将程序应用于各种现实世界数据,包括中维光学手写数字数据集和高维微阵列基因表达数据集。实验结果证实了我们方法的有效性。在模式识别中以及从模型选择的角度来看,我们的过程表明,可以通过解决一个非常简单的无约束目标函数来选择变量中最具区别性的子集,该函数实际上可以通过显式表达式获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号