首页> 外文会议>IEEE International Conference on Computer and Communications >An Improved Weighted KNN Algorithm for Imbalanced Data Classification
【24h】

An Improved Weighted KNN Algorithm for Imbalanced Data Classification

机译:改进的加权KNN算法在不平衡数据分类中的应用

获取原文

摘要

The k-nearest neighbor (KNN) is a widely used classification algorithm in data mining. One of the problems faced by the KNN approach is how to determine the appropriate value of k. The common value of k is usually not optimal for all instances, especially when there is a large difference between instances. In this paper, we take a proposed training method (PTM) to select the corresponding optimal local k value for every instance according to the distribution characteristics, and apply it to class-imbalanced data sets. Then, considering the difference in impact due to distance between the instance and its neighbors, we assign different weights to its neighbors which is called weighted k-nearest neighbor (WKNN), and classify the test instance by weighted voting. The new proposed PTM-WKNN method combines the advantages of past methods, and aims at improving the classification performance of imbalanced data. In addition, we do an experiment on the class-imbalanced data sets from the University of California at Irvine (UCI) machine learning repository, and obtain the values of Recall, G-mean and F-score as evaluations. The experiment results show that the proposed method has a better performance in class-imbalanced data sets.
机译:k最近邻(KNN)是数据挖掘中广泛使用的分类算法。 KNN方法面临的问题之一是如何确定k的适当值。 k的公共值通常并非对所有实例都是最佳的,尤其是在实例之间存在较大差异时。在本文中,我们采用一种建议的训练方法(PTM)根据分布特征为每个实例选择相应的最佳局部k值,并将其应用于类不平衡数据集。然后,考虑到由于实例与邻居之间的距离而造成的影响差异,我们为邻居分配了不同的权重,称为加权k最近邻居(WKNN),并通过加权投票对测试实例进行分类。新提出的PTM-WKNN方法结合了以往方法的优点,旨在提高不平衡数据的分类性能。此外,我们对来自加州大学尔湾分校(UCI)机器学习资源库中的类不平衡数据集进行了实验,并获得了Recall,G-mean和F-score的值作为评估值。实验结果表明,该方法在类不平衡数据集中具有较好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号