首页> 外文期刊>Journal of intelligent & fuzzy systems: Applications in Engineering and Technology >HIBoost: A hubness-aware ensemble learning algorithm for high-dimensional imbalanced data classification
【24h】

HIBoost: A hubness-aware ensemble learning algorithm for high-dimensional imbalanced data classification

机译:Hiboost:高维不平衡数据分类的载体感知合奏学习算法

获取原文
获取原文并翻译 | 示例
           

摘要

Learning from high-dimensional imbalanced data is prevalent in many vital real-world applications, which poses a severe challenge to traditional data mining and machine learning algorithms. The existing works generally use dimension reduction methods to deal with the curse of dimensionality, then apply traditional imbalance learning techniques to combat the problem of class imbalance. However, dimensionality reduction may cause the loss of useful information, especially for the minority classes. This paper introduces an ensemble-based method, HIBoost, to directly handle the imbalanced learning problem in high dimensional space. HIBoost takes into account the inherent high-dimensional hubness phenomenon, i.e., high-dimensional data tends to contain the singular points (hubs and anti-hubs) which frequently or rarely occur in k-nearest neighbors of other points. For the singular hubs and anti-hubs induced by high dimension, HIBoost introduces a discount factor to restrict the weight growth of them in the process of updating weight, so that the risk of over fitting can be reduced when training component classifiers. For class imbalance problem, HIBoost uses SMOTE to balance the training data in each iteration so as to alleviate the prediction bias of component classifiers. Experimental results based on sixteen high-dimensional imbalanced data sets demonstrate the effectiveness of HIBoost.
机译:从高维不平衡数据学习在许多重要的现实应用程序中是普遍的,这对传统数据挖掘和机器学习算法构成了严峻的挑战。现有的作品一般使用尺寸减少方法来处理维数维度,然后应用传统的不平衡学习技术来打击类别不平衡的问题。但是,减少维度可能导致损失有用的信息,特别是对于少数阶级。本文介绍了一种基于集合的方法,即直接处理高维空间中的不平衡学习问题。 Hiboost考虑了固有的高维毂性现象,即高维数据倾向于包含奇异点(集线器和抗毂),其经常或很少发生在其他点的k最近邻居中。对于由高尺寸引起的奇异轮毂和抗枢纽,河口ost引入了在更新重量的过程中限制它们的重量生长的折扣因子,从而在训练组件分类器时可以减少过度拟合的风险。对于类别不平衡问题,Hiboost使用Smote在每次迭代中平衡训练数据,以便缓解组件分类器的预测偏差。基于16个高维不平衡数据集的实验结果证明了河口ost的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号