首页> 外文期刊>IEEE Transactions on Fuzzy Systems >Fuzzy Support Vector Machine With Relative Density Information for Classifying Imbalanced Data
【24h】

Fuzzy Support Vector Machine With Relative Density Information for Classifying Imbalanced Data

机译:具有相对密度信息的模糊支持向量机对不平衡数据的分类

获取原文
获取原文并翻译 | 示例

摘要

Fuzzy support vector machine (FSVM) has been combined with class imbalance learning (CIL) strategies to address the problem of classifying skewed data. However, the existing approaches hold several inherent drawbacks, causing the inaccurate prior data distribution estimation, further decreasing the quality of the classification model. To solve this problem, we present a more robust prior data distribution information extraction method named relative density, and two novel FSVM-CIL algorithms based on the relative density information in this paper. In our proposed algorithms, a K-nearest neighbors-based probability density estimation (KNN-PDE) alike strategy is utilized to calculate the relative density of each training instance. In particular, the relative density is irrelevant with the dimensionality of data distribution in feature space, but only reflects the significance of each instance within its class; hence, it is more robust than the absolute distance information. In addition, the relative density can better seize the prior data distribution information, no matter the data distribution is easy or complex. Even for the data with small injunctions or a large class overlap, the relative density information can reflect its details well. We evaluated the proposed algorithms on an amount of synthetic and real-world imbalanced datasets. The results show that our proposed algorithms obviously outperform to some previous work, especially on those datasets with sophisticated distributions.
机译:模糊支持向量机(FSVM)已与类不平衡学习(CIL)策略结合使用,以解决对偏斜数据进行分类的问题。但是,现有方法存在一些固有的缺陷,导致先前的数据分布估计不准确,从而进一步降低了分类模型的质量。为了解决这个问题,本文提出了一种更健壮的先验数据分布信息提取方法,称为相对密度,并提出了两种基于相对密度信息的新型FSVM-CIL算法。在我们提出的算法中,基于K近邻的概率密度估计(KNN-PDE)相似的策略用于计算每个训练实例的相对密度。特别是,相对密度与特征空间中数据分布的维数无关,而仅反映了每个实例在其类内的重要性。因此,它比绝对距离信息更健壮。另外,无论数据分布是简单还是复杂,相对密度都可以更好地抓住先验数据分布信息。即使对于禁令较小或类别重叠较大的数据,相对密度信息也可以很好地反映其详细信息。我们在大量的合成和真实世界的不平衡数据集上评估了提出的算法。结果表明,我们提出的算法明显优于以前的工作,特别是在那些具有复杂分布的数据集上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号