首页> 外文会议>International Conference on Advances in Pattern Recognition >Quartiles based UnderSampling(QUS): A Simple and Novel Method to increase the Classification rate of positives in Imbalanced Datasets
【24h】

Quartiles based UnderSampling(QUS): A Simple and Novel Method to increase the Classification rate of positives in Imbalanced Datasets

机译:基于四分位数的欠采样(QUS):一种简单新颖的方法,可以提高不平衡数据集中正值的分类率

获取原文

摘要

The main challenge in learning from imbalanced datasets is the presence of a large set of training examples available for the negatives(majority class instances), and very few positives(minority class instances). This may result in a good overall performance of the classifier even though there is a huge red uction in the classification rate of positives. Quartiles based UnderSampling(QUS) method proposed in this paper, addresses the above problem in a simple way. That is balancing the dataset by selecting the negatives based on their similarity with respect to 5 quartiles: minimum, quartile1(Q1), median, quartile3(Q3) and maximum. Intention is to reduce the influence of excessive negatives on the classifier, which may bias it towards a better negatives classification otherwise. An advantage of this undersampling method is parameter independence and gives better results compared to the state of the art methods. The proposed method is tested on kNN (k Nearest Neighbour) classifier and empirical results improve the classification rate of positives than the original unprocessed imbalanced dataset.
机译:从不平衡数据集中学习的主要挑战是,存在大量可用于否定词(多数类别实例)的训练示例,而很少有肯定例题(少数族裔实例)。即使阳性分类率有很大的提高,这也可能会导致分类器的整体性能良好。本文提出的基于四分位数的欠采样(QUS)方法以一种简单的方式解决了上述问题。通过根据负数相对于5个四分位数的相似度来选择负数,从而平衡数据集:最小值,四分位数1(Q1),中位数,四分位数3(Q3)和最大值。目的是减少过多的负面因素对分类器的影响,否则可能会使分类器偏向于更好的负面因素分类。这种欠采样方法的一个优点是参数独立性,与现有技术相比,其结果更好。所提出的方法在kNN(k最近邻)分类器上进行了测试,经验结果比未处理的不平衡原始数据集提高了正值的分类率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号