Quartiles based UnderSampling(QUS): A Simple and Novel Method to increase the Classification rate of positives in Imbalanced Datasets

机译：基于四分位数的欠采样（QUS）：一种简单新颖的方法，可以提高不平衡数据集中正值的分类率

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The main challenge in learning from imbalanced datasets is the presence of a large set of training examples available for the negatives(majority class instances), and very few positives(minority class instances). This may result in a good overall performance of the classifier even though there is a huge red uction in the classification rate of positives. Quartiles based UnderSampling(QUS) method proposed in this paper, addresses the above problem in a simple way. That is balancing the dataset by selecting the negatives based on their similarity with respect to 5 quartiles: minimum, quartile1(Q1), median, quartile3(Q3) and maximum. Intention is to reduce the influence of excessive negatives on the classifier, which may bias it towards a better negatives classification otherwise. An advantage of this undersampling method is parameter independence and gives better results compared to the state of the art methods. The proposed method is tested on kNN (k Nearest Neighbour) classifier and empirical results improve the classification rate of positives than the original unprocessed imbalanced dataset.

机译：从不平衡数据集中学习的主要挑战是，存在大量可用于否定词（多数类别实例）的训练示例，而很少有肯定例题（少数族裔实例）。即使阳性分类率有很大的提高，这也可能会导致分类器的整体性能良好。本文提出的基于四分位数的欠采样（QUS）方法以一种简单的方式解决了上述问题。通过根据负数相对于5个四分位数的相似度来选择负数，从而平衡数据集：最小值，四分位数1（Q1），中位数，四分位数3（Q3）和最大值。目的是减少过多的负面因素对分类器的影响，否则可能会使分类器偏向于更好的负面因素分类。这种欠采样方法的一个优点是参数独立性，与现有技术相比，其结果更好。所提出的方法在kNN（k最近邻）分类器上进行了测试，经验结果比未处理的不平衡原始数据集提高了正值的分类率。

著录项

来源
《International Conference on Advances in Pattern Recognition》|2017年|1-6|共6页
会议地点 Bangalore(IN)
作者
C.V. Krishna Veni; T. Sobha Rani;
展开▼
作者单位

SCIS University of Hyderabad Hyderabad India;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Sensitivity; Complexity theory; Indexes; Medical diagnosis; Credit cards; Linear matrix inequalities;

机译：训练;灵敏度;复杂性理论；索引；医学诊断；信用卡;线性矩阵不等式;

相似文献

外文文献
中文文献
专利

1. Discussion on Vuttipittayamongkol, P. and Elyan, E., Improved Overlap-Based Undersampling for Imbalanced Dataset Classification with Application to Epilepsy and Parkinson's Disease [J] . Fernandez Alberto International Journal of Neural Systems . 2020,第9期

机译：探讨Vuttitipittamongkol，P.和Elyan，E。，改进了基于重叠的缺乏采样，用于癫痫和帕金森病的应用程序分类
2. Response to Discussion on "Improved Overlap-Based Undersampling for Imbalanced Dataset Classification with Application to Epilepsy and Parkinson's Disease," [J] . Vuttipittayamongkol Pattaramon, Elyan Eyad International Journal of Neural Systems . 2020,第9期

机译：讨论“利用癫痫和帕金森疾病的申请改善基于重叠的缺口采样的讨论”的讨论，“
3. Improved Overlap-based Undersampling for Imbalanced Dataset Classification with Application to Epilepsy and Parkinson's Disease [J] . Vuttipittayamongkol Pattaramon, Elyan Eyad International Journal of Neural Systems . 2020,第8期

机译：改进基于重叠的缺乏采样，用于对癫痫和帕金森疾病的应用程序进行不平衡数据集分类
4. Quartiles based UnderSampling(QUS): A Simple and Novel Method to increase the Classification rate of positives in Imbalanced Datasets [C] . C.V. Krishna Veni, T. Sobha Rani International Conference on Advances in Pattern Recognition . 2017

机译：基于Quartiles的欠采样（QU）：一种简单而新颖的方法，可以提高不平衡数据集中的阳性分类率
5. Active learning with support vector machines for imbalanced datasets and a method for stopping active learning based on stabilizing predictions. [D] . Bloodgood, Michael. 2009

机译：支持向量机用于不平衡数据集的主动学习，以及一种基于稳定预测的主动学习停止方法。
6. Overlap-Based Undersampling Method for Classification of Imbalanced Medical Datasets [O] . Pattaramon Vuttipittayamongkol, Eyad Elyan -1

机译：基于重叠的欠采样分类医学数据集的方法
7. Overlap-Based Undersampling Method for Classification of Imbalanced Medical Datasets [O] . Pattaramon Vuttipittayamongkol, Eyad Elyan 2020

机译：基于重叠的非衡度医疗数据集分类的欠采样方法

Quartiles based UnderSampling(QUS): A Simple and Novel Method to increase the Classification rate of positives in Imbalanced Datasets

摘要

著录项

相似文献

相关主题

期刊订阅