首页> 外文期刊>International journal of software engineering and knowledge engineering >Feature Selection Method Based on Weighted Mutual Information for Imbalanced Data
【24h】

Feature Selection Method Based on Weighted Mutual Information for Imbalanced Data

机译:基于加权互信息的不平衡数据特征选择方法

获取原文
获取原文并翻译 | 示例
       

摘要

The class imbalance problem has negative effects on the performance of feature selection in imbalanced data. Traditional feature selection algorithms always study on the balanced class distribution of the data and improve the overall classification accuracy for the optimization goal, which tends to be overwhelmed by the large classes, ignoring the small ones. This paper proposes a novel feature selection method based on the weighted mutual information (WMI) for the imbalanced data, defined as WMI algorithm. The WMI algorithm assigns different weights to the samples based on the fuzzy c-means (FCM) clustering algorithm and then calculates the mutual information based on the weight of each sample. This paper used the AUC as the evaluation criterion of the selected feature. At last, four unbalanced datasets from NASA software defect datasets are used to validate the proposed approach. Experimental results show that the proposed method achieves higher prediction accuracy of both minority class and majority class.
机译:类不平衡问题对不平衡数据中特征选择的性能产生负面影响。传统的特征选择算法总是研究数据的平衡类分布,并提高了用于优化目标的总体分类精度,而大型类往往会不知所措,而忽略小类。针对不平衡数据,提出了一种基于加权互信息(WMI)的特征选择方法,即WMI算法。 WMI算法基于模糊c均值(FCM)聚类算法为样本分配不同的权重,然后根据每个样本的权重计算互信息。本文使用AUC作为所选特征的评估标准。最后,使用来自NASA软件缺陷数据集的四个不平衡数据集来验证所提出的方法。实验结果表明,该方法在少数族和多数族中均具有较高的预测精度。

著录项

  • 来源
  • 作者单位

    College of Computer and Communication Engineering China University of Petroleum (East China) Qingdao, Shandong Province, P. R. China;

    College of Computer and Communication Engineering China University of Petroleum (East China) Qingdao, Shandong Province, P. R. China;

    College of Computer and Communication Engineering China University of Petroleum (East China) Qingdao, Shandong Province, P. R. China;

    School of Microelectronics Tianjin University Tianjin 300072, P. R. China;

    Institute for Sensing and Embedded Network Systems Engineering Florida Atlantic University, 777 Glades Road Boca Raton, FL,33431, USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Feature selection; fuzzy c-means clustering; imbalanced data; mutual information;

    机译:功能选择;模糊c均值聚类数据不平衡;共同信息;
  • 入库时间 2022-08-18 04:04:21

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号