The Fisher-Markov Selector: Fast Selecting Maximally Separable Feature Subset for Multiclass Classification with Applications to High-Dimensional Data

Cheng Qiang; Zhou Hongbo; Cheng Jie

首页> 外文期刊>Pattern Analysis and Machine Intelligence, IEEE Transactions on >The Fisher-Markov Selector: Fast Selecting Maximally Separable Feature Subset for Multiclass Classification with Applications to High-Dimensional Data

【24h】

The Fisher-Markov Selector: Fast Selecting Maximally Separable Feature Subset for Multiclass Classification with Applications to High-Dimensional Data

机译：Fisher-Markov选择器：用于多类分类的快速选择最大可分离特征子集，并应用于高维数据

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Selecting features for multiclass classification is a critically important task for pattern recognition and machine learning applications. Especially challenging is selecting an optimal subset of features from high-dimensional data, which typically have many more variables than observations and contain significant noise, missing components, or outliers. Existing methods either cannot handle high-dimensional data efficiently or scalably, or can only obtain local optimum instead of global optimum. Toward the selection of the globally optimal subset of features efficiently, we introduce a new selectorȁ4;which we call the Fisher-Markov selectorȁ4;to identify those features that are the most useful in describing essential differences among the possible groups. In particular, in this paper we present a way to represent essential discriminating characteristics together with the sparsity as an optimization objective. With properly identified measures for the sparseness and discriminativeness in possibly high-dimensional settings, we take a systematic approach for optimizing the measures to choose the best feature subset. We use Markov random field optimization techniques to solve the formulated objective functions for simultaneous feature selection. Our results are noncombinatorial, and they can achieve the exact global optimum of the objective function for some special kernels. The method is fast; in particular, it can be linear in the number of features and quadratic in the number of observations. We apply our procedure to a variety of real-world data, including mid--dimensional optical handwritten digit data set and high-dimensional microarray gene expression data sets. The effectiveness of our method is confirmed by experimental results. In pattern recognition and from a model selection viewpoint, our procedure says that it is possible to select the most discriminating subset of variables by solving a very simple unconstrained objective function which in f act can be obtained with an explicit expression.

机译：选择用于多类分类的特征对于模式识别和机器学习应用而言是至关重要的任务。尤其具有挑战性的是从高维数据中选择特征的最佳子集，该特征子集通常比观测值具有更多的变量，并且包含明显的噪声，缺失的分量或离群值。现有方法要么不能有效或可扩展地处理高维数据，要么只能获得局部最优而不是全局最优。为了有效地选择特征的全局最优子集，我们引入了一个新的选择器ȁ4；我们将其称为Fisher-Markov选择器ȁ4；以识别那些对描述可能的组之间的本质差异最有用的特征。特别是，在本文中，我们提出了一种以稀疏性为优化目标来表示基本区分特征的方法。通过适当识别可能的高维设置中的稀疏性和判别力的度量，我们采用了系统的方法来优化度量以选择最佳特征子集。我们使用马尔可夫随机场优化技术来解决制定的目标函数，以便同时进行特征选择。我们的结果是非组合的，对于某些特殊的内核，它们可以实现目标函数的精确全局最优。该方法速度快；特别是，特征数量可以是线性的，而观察数量可以是二次的。我们将程序应用于各种现实世界数据，包括中维光学手写数字数据集和高维微阵列基因表达数据集。实验结果证实了我们方法的有效性。在模式识别中以及从模型选择的角度来看，我们的过程表明，可以通过解决一个非常简单的无约束目标函数来选择变量中最具区别性的子集，该函数实际上可以通过显式表达式获得。

著录项

来源
《Pattern Analysis and Machine Intelligence, IEEE Transactions on》 |2011年第6期|p.1217-1233|共17页
作者
Cheng Qiang; Zhou Hongbo; Cheng Jie;
展开▼
作者单位

Southern Illinois University Carbondale, Carbondale;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Classification; Fisher's linear discriminant analysis; Markov random field.; feature subset selection; high-dimensional data; kernel;

机译：分类;费舍尔线性判别分析;马尔可夫随机场;特征子集选择;高维数据;核;

相似文献

外文文献
中文文献
专利

1. A fast separability-based feature-selection method for high-dimensional remotely sensed image classification [J] . Guo BF, Damper RI, Gunn SR, Pattern Recognition: The Journal of the Pattern Recognition Society . 2008,第5期

机译：基于快速可分离性的高维遥感图像分类特征选择方法
2. Semi-wrapper feature subset selector for feed-forward neural networks: Applications to binary and multi-class classification problems [J] . Tallon-Ballesteros Antonio J., Riquelme Jose C., Ruiz Roberto Neurocomputing . 2019,第AUGa11期

机译：前馈神经网络的半包装特征子集选择器：对二进制和多类分类问题的应用
3. Semi-wrapper feature subset selector for feed-forward neural networks: Applications to binary and multi-class classification problems [J] . Tallon-Ballesteros Antonio J., Riquelme Jose C., Ruiz Roberto Neurocomputing . 2019,第Auga11期

机译：用于前馈神经网络的半包装功能子集选择器：用于二进制和多级分类问题的应用
4. Implementation of FAST Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data [C] . Smit Shilu, Kushal Sheth, Ekata Mehul International Conference on Information and Communication Technology for Sustainable Development . 2016

机译：基于快速聚类的特征子集选择算法的实现高维数据
5. Feature Selection and Classification for High-Dimensional Biological Data Under Cross-Validation Framework [D] . Zhong, Yi. 2018

机译：交叉验证框架下高维生物数据的特征选择与分类
6. Iterative ensemble feature selection for multiclass classification of imbalanced microarray data [O] . Junshan Yang, Jiarui Zhou, Zexuan Zhu, 2016

机译：迭代集成特征选择用于不平衡微阵列数据的多类分类
7. Hybrid approaches to feature subset selection for data classification in high-dimensional feature space [O] . Maysa Ibrahem Almulla Khalaf, John Q Gan 2020

机译：HybrId方法来具有用于高维特征空间中数据分类的子集选择

The Fisher-Markov Selector: Fast Selecting Maximally Separable Feature Subset for Multiclass Classification with Applications to High-Dimensional Data

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅