An Embedded Feature Selection Method for Imbalanced Data Classification

Haoyue Liu; MengChu Zhou; Qing Liu

首页> 中文期刊> 《自动化学报：英文版》 >An Embedded Feature Selection Method for Imbalanced Data Classification

An Embedded Feature Selection Method for Imbalanced Data Classification

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

Imbalanced data is one type of datasets that are frequently found in real-world applications, e.g., fraud detection and cancer diagnosis. For this type of datasets, improving the accuracy to identify their minority class is a critically important issue. Feature selection is one method to address this issue. An effective feature selection method can choose a subset of features that favor in the accurate determination of the minority class. A decision tree is a classifier that can be built up by using different splitting criteria. Its advantage is the ease of detecting which feature is used as a splitting node. Thus, it is possible to use a decision tree splitting criterion as a feature selection method. In this paper, an embedded feature selection method using our proposed weighted Gini index (WGI) is proposed. Its comparison results with Chi2, F-statistic and Gini index feature selection methods show that F-statistic and Chi2 reach the best performance when only a few features are selected. As the number of selected features increases, our proposed method has the highest probability of achieving the best performance. The area under a receiver operating characteristic curve (ROC AUC) and F-measure are used as evaluation criteria. Experimental results with two datasets show that ROC AUC performance can be high, even if only a few features are selected and used, and only changes slightly as more and more features are selected. However, the performance of Fmeasure achieves excellent performance only if 20% or more of features are chosen. The results are helpful for practitioners to select a proper feature selection method when facing a practical problem.

著录项

来源
《自动化学报：英文版》 |2019年第3期|P.703-715|共13页
作者
Haoyue Liu; MengChu Zhou; Qing Liu;
展开▼
作者单位

[1]Department of Electrical and Computer Engineering;

New Jersey Institute of Technology;

Newark;

NJ 07102 USA;

[1]Department of Electrical and Computer Engineering;

New Jersey Institute of Technology;

Newark;

NJ 07102 USA;

[2]Institute of Systems Engineering;

Macau University of Science and Technology;

Macau 999078;

China;

[1]Department of Electrical and Computer Engineering;

New Jersey Institute of Technology;

Newark;

NJ 07102 USA;

展开▼
原文格式 PDF
正文语种 CHI
中图分类自动化技术、计算机技术;
关键词
Classification and regression tree; feature selection; imbalanced data; weighted Gini index (WGI);

机译：分类和回归树;特征选择;数据不平衡;加权基尼系数（WGI）;

An Embedded Feature Selection Method for Imbalanced Data Classification

摘要

著录项

相关主题

期刊订阅