首页> 外文会议>IEEE International Conference on Networking, Sensing and Control >Weighted Gini index feature selection method for imbalanced data
【24h】

Weighted Gini index feature selection method for imbalanced data

机译:不平衡数据的加权基尼系数特征选择方法

获取原文

摘要

An imbalanced class problem occurs within abundant real-world applications, e.g., fraud detection, text classification, and cancer diagnosis. Beside balancing the imbalanced data distribution to deal with imbalanced data problems, another significant way to solve the bias-to-majority problem is via proper feature selection. This work is intended to use a feature selection method that can choose a subset of features and make ROC AUC and F-measure results in order to achieve high performance on a minority class. In this paper, a weighted Gini index(WGI) feature selection method is proposed. In order to evaluate the proposed method, a comparison result among Chi-square, F-statistic and Gini index feature selection is shown, and Xgboost is the classifier that is used to test the performance of the subset of features. Experimental results indicate that F-statistic contains the best performance when a few features are selected. However, when the number of selected features increases, WGI feature selection achieves the best results. A comparison between the average results from ROC AUC and F-measure are also presented. It shows that ROC AUC always contains a good performance, even if only a few features are selected, and only changes slightly as the subset of features expands. However, the performance of F-measure achieves a good performance after 60% of features are chosen. The results are helpful for practitioners to select a proper feature selection method when facing a practical problem.
机译:在大量实际应用中会发生类不平衡问题,例如欺诈检测,文本分类和癌症诊断。除了平衡不平衡数据分布以处理不平衡数据问题外,解决多数偏差问题的另一种重要方法是通过适当的特征选择。这项工作旨在使用一种特征选择方法,该方法可以选择特征子集并得出ROC AUC和F量度结果,以在少数类别上实现高性能。提出了一种加权基尼系数特征选择方法。为了评估该方法,给出了卡方,F统计量和基尼系数特征选择之间的比较结果,Xgboost是用于测试特征子集性能的分类器。实验结果表明,F-统计包含了最好的性能,当选择了一些功能。但是,当所选功能的数量增加时,WGI功能选择将达到最佳效果。还提出了ROC AUC和F-measure的平均结果之间的比较。它表明,即使仅选择了几个功能,ROC AUC始终具有良好的性能,并且随着功能子集的扩展而仅会发生轻微变化。但是,在选择60%的特征后,F-measure的性能会达到良好的性能。这些结果有助于从业人员在面对实际问题时选择合适的特征选择方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号