首页> 外文会议>International Conference on Advanced Materials and Information Technology Processing >A Comparison Study of Cost-sensitive Learning and Sampling Methods on Imbalanced Data Sets
【24h】

A Comparison Study of Cost-sensitive Learning and Sampling Methods on Imbalanced Data Sets

机译:成本敏感学习与采样方法对不平衡数据集的比较研究

获取原文

摘要

The classifier, built from a highly-skewed class distribution data set, generally predicts an unknown sample as the majority class much more frequently than the minority class. This is due to the fact that the aim of classifier is designed to get the highest classification accuracy. We compare three classification methods dealing with the data sets in which class distribution is imbalanced and has non-uniform misclassification cost, namely cost-sensitive learning method whose misclassification cost is embedded in the algorithm, over-sampling method and under-sampling method. In this paper, we compare these three methods to determine which one will produce the best overall classification under any circumstance. We have the following conclusion: 1. Cost-sensitive learning is suitable for the classification of imbalanced dataset. It outperforms sampling methods overall, and is more stable than sampling methods except the condition that data set is quite small. 2. If the dataset is highly skewed or quite small, over-sampling methods may be better.
机译:由高度偏斜的类分发数据集构建的分类器通常比少数级别更频繁地将未知样本预测为大多数类。这是由于分类器的目的旨在获得最高分类准确性。我们比较了三个分类方法处理课程分布的数据集,其中分布不平衡,并且具有非均匀的错误分类成本,即成本敏感的学习方法,其错误分类成本嵌入了算法,过采样方法和采样方法。在本文中,我们比较这三种方法来确定哪一个在任何情况下都会产生最佳整体分类。我们得出以下结论:1。成本敏感的学习适用于分类数据集的分类。除了数据集相当小的条件之外,它总体上总体而言,比采样方法更加稳定。 2.如果数据集是高度倾斜或相当小的,则过度采样方法可能会更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号