DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets

Mina Alibeigi; Sattar Hashemi; Ali Hamzeh

首页> 外文期刊>Data & Knowledge Engineering >DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets

【24h】

DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets

机译：DBFS：一种有效的基于密度的特征选择方案，适用于小样本量和高维不平衡数据集

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Nowadays, imbalanced data sets are pervasive in real world human practices, and hence, become a very interesting research area within machine learning communities. Imbalanced data sets introduce a significant reduction in performance of standard classifiers when they are invoked to learn data underlying concepts. The problem becomes even more sever when imbalanced data sets are involved with high dimensions. This paper presents a novel feature ranking approach based on the probability density estimation to cope with these issues. The idea behind our approach, named Density Based Feature Selection (DBFS), is that features' distributions over classes can bring significant benefits to feature selection algorithms. In other words, to explore the contribution of each attribute and assign it an appropriate rank, DBFS takes into account features' corresponding distributions over all classes along with their correlations. To show the effectiveness of the presented approach, well-known feature ranking methods are implemented and compared with our approach across varieties of small sample size and high dimensional data sets from microarray, mass spectrometry and text mining domains. Our theoretical analysis and experimental observations reveal that our approach is the method of choice by offering a simple yet effective feature ranking method based on well-known statistical evaluation measures.

机译：如今，不平衡的数据集普遍存在于现实世界的人类实践中，因此成为机器学习社区中非常有趣的研究领域。当调用标准分类器来学习数据基础概念时，不平衡的数据集会大大降低标准分类器的性能。当高维涉及不平衡的数据集时，问题将变得更加严峻。本文提出了一种基于概率密度估计的新颖特征分级方法，以解决这些问题。我们的方法背后的想法叫“基于密度的特征选择（DBFS）”，即特征在类中的分布可以为特征选择算法带来巨大的好处。换句话说，为了探索每个属性的贡献并为其分配适当的等级，DBFS考虑了所有类中要素的对应分布及其相关性。为了展示所提出方法的有效性，我们实施了著名的特征分级方法，并将其与我们的方法进行了比较，该方法适用于小样本大小和来自微阵列，质谱和文本挖掘领域的高维数据集。我们的理论分析和实验观察表明，我们的方法是选择方法，它基于已知的统计评估方法提供了一种简单而有效的特征排名方法。

著录项

来源
《Data & Knowledge Engineering》 |2012年第2012期|67-103|共37页
作者
Mina Alibeigi; Sattar Hashemi; Ali Hamzeh;
展开▼
作者单位

CSE and IT Dept., Engineering Campus Number 2, Mollasadra Ave., Shiraz, Iran;

CSE and IT Dept., Engineering Campus Number 2, Mollasadra Ave., Shiraz, Iran;

CSE and IT Dept., Engineering Campus Number 2, Mollasadra Ave., Shiraz, Iran;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
feature selection; imbalanced data set; probability density function (PDF);

机译：特征选择;数据集不平衡;概率密度函数（PDF）;

相似文献

外文文献
中文文献
专利

1. A novel feature selection scheme for high-dimensional data sets: four-Staged Feature Selection [J] . Pehlivanli Ayca Cakmak Journal of applied statistics . 2016,第5a8期

机译：高维数据集的新颖特征选择方案：四阶段特征选择
2. Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines [J] . Sebastián Maldonado, Richard Weber, Fazel Famili Information Sciences: An International Journal . 2014,第Null期

机译：使用支持向量机的高维类不平衡数据集特征选择
3. Unsupervised Feature Selection Based on the Distribution of Features Attributed to Imbalanced Data Sets [J] . Mina Alibeigi, Sattar Hashemi, Ali Hamzeh International Journal of Artificial Intelligence and Expert Systems (IJAE) . 2011,第1期

机译：基于归因于不平衡数据集的特征分布的无监督特征选择
4. An effective feature selection method based on pair-wise feature proximity for high dimensional low sample size data [C] . S L Happy, Ramanarayan Mohanty, Aurobinda Routray European Signal Processing Conference . 2017

机译：基于成对特征接近度的高维低样本量数据有效特征选择方法
5. Proxy Relearning for Feature-Driven Pattern Recognition in High-Dimensional Imbalanced Time Series Data Sets [D] . Cho, Wilfred Yau-Chuen. 2017

机译：高维不平衡时间序列数据集中特征驱动模式识别的代理重新学习
6. Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm [O] . Garba Abdulrauf Sharifai, Zurinahni Zainol 2020

机译：基于鲁棒相关基于冗余和二进制蚱蜢优化算法的高维和非兼容生物医学数据的特征选择
7. An Effective Feature Selection Method Based on Pair-Wise Feature Proximity for High Dimensional Low Sample Size Data [O] . Happy, S L, Mohanty, Ramanarayan, Routray, Aurobinda 2017

机译：一种基于对偶特征的有效特征选择方法接近高维低样本数据
8. The Probability of Error on the Design Set as a Function of the Sample Size and Dimensionality [R] . Foley, D. H. 1971

机译：设计集的误差概率作为样本大小和维数的函数

DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets

摘要

著录项

相似文献

相关主题

期刊订阅