首页> 外文会议>International Conference on Advances in Pattern Recognition >Quartiles based UnderSampling(QUS): A Simple and Novel Method to increase the Classification rate of positives in Imbalanced Datasets
【24h】

Quartiles based UnderSampling(QUS): A Simple and Novel Method to increase the Classification rate of positives in Imbalanced Datasets

机译:基于Quartiles的欠采样(QU):一种简单而新颖的方法,可以提高不平衡数据集中的阳性分类率

获取原文

摘要

The main challenge in learning from imbalanced datasets is the presence of a large set of training examples available for the negatives(majority class instances), and very few positives(minority class instances). This may result in a good overall performance of the classifier even though there is a huge red uction in the classification rate of positives. Quartiles based UnderSampling(QUS) method proposed in this paper, addresses the above problem in a simple way. That is balancing the dataset by selecting the negatives based on their similarity with respect to 5 quartiles: minimum, quartile1(Q1), median, quartile3(Q3) and maximum. Intention is to reduce the influence of excessive negatives on the classifier, which may bias it towards a better negatives classification otherwise. An advantage of this undersampling method is parameter independence and gives better results compared to the state of the art methods. The proposed method is tested on kNN (k Nearest Neighbour) classifier and empirical results improve the classification rate of positives than the original unprocessed imbalanced dataset.
机译:从非衡产数据集学习的主要挑战是存在一大一组培训示例,可用于否定(多数类实例),且少数级别实例)。这可能导致分类器的良好整体性能,即使在阳性的分类率上存在巨大的红色敏感性。本文提出的基于Quartiles的基于欠采样(QUS)方法,以简单的方式解决了上述问题。这是通过基于与5个四分位数的相似性选择否定来平衡数据集:最小,Quartile1(Q1),中位数,Quartile3(Q3)和最大值。意图是减少对分类器过度否定的影响,这可能会偏向更好的否定分类。与现有技术的状态相比,这种欠采样方法的一个优点是参数独立性,并提供更好的结果。该方法在KNN(K最近邻居)分类器上测试,经验结果提高了阳性的分类率,而不是原始未处理的不平衡数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号