首页> 外文期刊>Connection Science >A weighted pattern matching approach for classification of imbalanced data with a fireworks-based algorithm for feature selection
【24h】

A weighted pattern matching approach for classification of imbalanced data with a fireworks-based algorithm for feature selection

机译:一种基于烟花的特征选择算法的不平衡数据分类的加权模式匹配方法

获取原文
获取原文并翻译 | 示例
       

摘要

Learning a classifier from imbalanced data is a challenging problem in Machine learning. A dataset is said to be imbalanced when the number of instances belonging to one class is much less than the number of instances belonging to the other class. Classifiers that proves efficient on standard data fail when the data is imbalanced as they are over trained by the majority class instances. Since class imbalance is a common characteristic of real-world data, the need for better classifiers becomes essential. This paper proposes a novel instance-based classification algorithm called Weighted Pattern Matching based Classification (PMC+) for classifying imbalanced data. PMC+ classifies unlabelled instances by computing the absolute difference between the feature values of the instances in the dataset and the unlabelled instance. PMC+ employs a simple classification procedure with weights and shows reasonably good performance. To improve the performance of PMC+, Fireworks based Feature and Weight Selection algorithm based on the idea of PMC+ has been proposed. PMC+ is evaluated on 44 binary imbalanced datasets and 15 multiclass imbalanced datasets. Although PMC+ does not employ a resampling or cost-sensitive method, experiments show that PMC+ is effective for classification of imbalanced data. The results of the experiments were validated using various non-parametric statistical tests.
机译:从不平衡数据中学习分类器是机器学习中一个具有挑战性的问题。当属于一个类别的实例数量远小于属于另一类别的实例数量时,数据集被认为是不平衡的。当数据不平衡时,证明对标准数据有效的分类器将失败,因为它们受到多数类实例的过度训练。由于类不平衡是现实世界数据的普遍特征,因此需要更好的分类器变得至关重要。本文提出了一种新的基于实例的分类算法,称为基于加权模式匹配的分类(PMC +),用于对不平衡数据进行分类。 PMC +通过计算数据集中实例的特征值与未标记实例的特征值之间的绝对差来对未标记实例进行分类。 PMC +采用具有权重的简单分类程序,并显示出相当好的性能。为了提高PMC +的性能,提出了基于PMC +思想的基于Fireworks的特征和权重选择算法。在44个二进制不平衡数据集和15个多类不平衡数据集上对PMC +进行了评估。尽管PMC +不采用重采样或对成本敏感的方法,但实验表明PMC +可有效地对不平衡数据进行分类。实验的结果使用各种非参数统计检验进行了验证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号