首页> 外文会议>International Joint Conference on Neural Networks >Combining uniform manifold approximation with localized affine shadowsampling improves classification of imbalanced datasets
【24h】

Combining uniform manifold approximation with localized affine shadowsampling improves classification of imbalanced datasets

机译:将均匀流形近似与局部仿射阴影采样相结合,改进了不平衡数据集的分类

获取原文

摘要

Oversampling approaches are a popular choice to improve classification on imbalanced datasets. The SMOTE algorithm is the pioneer for many algorithms, built as extensions of SMOTE, to solve its problem of over-generalization of the minority class. Some extensions adopt the approach of learning the minority class data distribution through clustering and manifold learning techniques. The Localised Random Affine Shadowsampling (LoRAS) algorithm, models the convex space, controlling the local variance of a synthetic sample by constructing them from convex combinations of multiple shadow samples generated by adding Gaussian noise to the original minority samples. LoRAS also uses t-SNE for a manifold learning step to identify minority class data neighbourhoods. The algorithm is known to outperform some early SMOTE extensions, improving F1-Score and Balanced accuracy for highly imbalanced classification problems. However, the state-of-the-art manifold learning algorithm UMAP is known to preserve the local and global structure of the latent data manifold better than t-SNE and is considerably faster. We have integrated the UMAP for manifold learning with localized affine shadowsampling, to build the LoRAS-UMAP algorithm. We have benchmarked the new algorithm LoRAS-UMAP against some state-of-the-art oversampling algorithms on 14 publicly available datasets characterized by high imbalance, high dimensionality, and high absolute imbalance. In summary, we incorporated UMAP for the manifold learning step yielding better F1-Score, Balanced accuracy and runtime for the LoRAS algorithm in comparison to t-SNE for manifold learning, particularly in the case of high-dimensional datasets.
机译:过采样方法是改进不平衡数据集分类的常用方法。SMOTE算法是许多算法的先驱,这些算法是SMOTE的扩展,用于解决少数类的过度泛化问题。一些扩展采用了通过聚类和多种学习技术学习少数族裔数据分布的方法。局部随机仿射阴影采样(LoRAS)算法对凸空间进行建模,通过将高斯噪声添加到原始少数样本中生成的多个阴影样本的凸组合来构造合成样本的局部方差,从而控制合成样本的局部方差。LoRAS还使用t-SNE进行多方面的学习,以识别少数族裔群体数据邻域。众所周知,该算法优于一些早期的SMOTE扩展,在高度不平衡的分类问题上提高了F1分数和平衡精度。然而,众所周知,最先进的流形学习算法UMAP比t-SNE更好地保留了潜在数据流形的局部和全局结构,而且速度要快得多。我们将用于流形学习的UMAP与局部仿射阴影采样相结合,以构建LoRAS UMAP算法。我们在14个具有高不平衡、高维度和高绝对不平衡特征的公开数据集上,对新算法LoRAS UMAP与一些最先进的过采样算法进行了基准测试。总之,我们将UMAP用于流形学习步骤,与用于流形学习的t-SNE相比,LoRAS算法的F1分数、平衡精度和运行时间更好,尤其是在高维数据集的情况下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号