首页> 外文期刊>Concurrency, practice and experience >Weighted ReliefF with threshold constraints of feature selection for imbalanced data classification
【24h】

Weighted ReliefF with threshold constraints of feature selection for imbalanced data classification

机译:加权Relieff,具有用于不平衡数据分类的特征选择的阈值约束

获取原文
获取原文并翻译 | 示例

摘要

Feature selection is a useful method for fulfilling the data classification since the inherent heterogeneity of data and the redundancy of features are often encountered in the current data exploding era. Some commonly used feature selection algorithms, which include but are not limited to Pearson, maximal information coefficient, and ReliefF, are well-posed under the assumption that instances are distributed homogenously in datasets. However, such an assumption might be not true in the practice. As such, in the presence of data imbalance, these traditional feature selection algorithms might be invalid due to their prejudices to the minority class, which includes few samples. The purpose of the addressed problem in this article is to develop an effective feature selection algorithm for imbalanced judicial datasets, which is capable of extracting essential features while deleting negligible ones according to the practical feature requirements. To achieve this goal, the number and the distribution of samples in each class are fully taken into consideration for the correlation analysis. Compared with the traditional feature selection algorithms, the proposed improved ReliefF algorithm is equipped with: (i) different weights of features according to the characteristics of heterogeneous samples in different classes; (ii) justice for imbalanced datasets; and (iii) threshold constraints resulting from the practical feature requirements. Finally, experiments on a judicial dataset and six public datasets well illustrate the effectiveness and the superiority of the proposed feature selection algorithm in improving the classification accuracy for imbalanced datasets.
机译:特征选择是用于满足数据分类的有用方法,因为数据的固有异质性和在当前数据爆炸时代中经常遇到特征的冗余。一些常用的特征选择算法包括但不限于Pearson,最大信息系数和Relieff,在假设实例在数据集中均匀地分布而良好地提出。然而,在实践中,这种假设可能不是真的。因此,在数据不平衡存在下,由于它们的偏见到少数类等级,这些传统的特征选择算法可能是无效的,其包括少量样本。本文中所解决的问题的目的是为不平衡司法数据集开发一个有效的特征选择算法,其能够根据实际特征要求删除可忽略不计的基本功能。为了实现这一目标,完全考虑每个类别中的样本的数量和分布,以考虑相关性分析。与传统的特征选择算法相比,所提出的改进的Relieff算法配备了:(i)根据不同类别中异质样品的特性的不同重量; (ii)对非衡产数据集的正义; (iii)实际特征要求产生的阈值约束。最后,在司法数据集和六个公共数据集上的实验很好地说明了所提出的特征选择算法的有效性和优越性,从而提高了不平衡数据集的分类精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号