...
首页> 外文期刊>Knowledge-Based Systems >Class imbalance and the curse of minority hubs
【24h】

Class imbalance and the curse of minority hubs

机译:阶级失衡与少数民族中心的诅咒

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Most machine learning tasks involve learning from high-dimensional data, which is often quite difficult to handle. Hubness is an aspect of the curse of dimensionality that was shown to be highly detrimental to k-nearest neighbor methods in high-dimensional feature spaces. Hubs, very frequent nearest neighbors, emerge as centers of influence within the data and often act as semantic singularities. This paper deals with evaluating the impact of hubness on learning under class imbalance with k-nearest neighbor methods. Our results suggest that, contrary to the common belief, minority class hubs might be responsible for most misclassification in many high-dimensional datasets. The standard approaches to learning under class imbalance usually clearly favor the instances of the minority class and are not well suited for handling such highly detrimental minority points. In our experiments, we have evaluated several state-of-the-art hubness-aware kNN classifiers that are based on learning from the neighbor occurrence models calculated from the training data. The experiments included learning under severe class imbalance, class overlap and mislabeling and the results suggest that the hubness-aware methods usually achieve promising results on the examined high-dimensional datasets. The improvements seem to be most pronounced when handling the difficult point types: borderline points, rare points and outliers. On most examined datasets, the hubness-aware approaches improve the classification precision of the minority classes and the recall of the majority class, which helps with reducing the negative impact of minority hubs. We argue that it might prove beneficial to combine the extensible hubness-aware voting frameworks with the existing class imbalanced kNN classifiers, in order to properly handle class imbalanced data in high-dimensional feature spaces.
机译:大多数机器学习任务都涉及从高维数据中学习,这通常很难处理。中心度是维数诅咒的一个方面,已证明对高维特征空间中的k最近邻方法非常有害。集线器(非常频繁的最近邻居)在数据中成为影响中心,并经常充当语义奇异点。本文使用k最近邻居方法评估班级失衡下学习度对学习的影响。我们的结果表明,与通常的看法相反,少数类中心可能是许多高维数据集中大多数错误分类的原因。在班级失衡下学习的标准方法通常显然倾向于少数群体的实例,而不适合处理这种严重损害少数群体的观点。在我们的实验中,我们评估了几种基于中心度的最新kNN分类器,这些分类器基于从训练数据计算得出的邻居出现模型中的学习。实验包括在严重的班级失衡,班级重叠和标签错误的情况下进行学习,结果表明,了解中心的方法通常在检查的高维数据集上取得可喜的结果。当处理难点类型:边界点,稀有点和离群值时,这些改进似乎最为明显。在大多数检查的数据集上,了解中心的方法提高了少数派类别的分类精度和多数派的召回率,这有助于减少少数派中心的负面影响。我们认为,将可扩展的具有中心性的投票框架与现有的类不平衡kNN分类器结合起来可能是有益的,以便在高维特征空间中正确处理类不平衡数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号