首页> 外文期刊>Mathematics >A New K-Nearest Neighbors Classifier for Big Data Based on Efficient Data Pruning
【24h】

A New K-Nearest Neighbors Classifier for Big Data Based on Efficient Data Pruning

机译:基于高效数据修剪的大数据的新的K-Collect邻居分类器

获取原文
           

摘要

The K-nearest neighbors (KNN) machine learning algorithm is a well-known non-parametric classification method. However, like other traditional data mining methods, applying it on big data comes with computational challenges. Indeed, KNN determines the class of a new sample based on the class of its nearest neighbors; however, identifying the neighbors in a large amount of data imposes a large computational cost so that it is no longer applicable by a single computing machine. One of the proposed techniques to make classification methods applicable on large datasets is pruning. LC-KNN is an improved KNN method which first clusters the data into some smaller partitions using the K-means clustering method; and then applies the KNN for each new sample on the partition which its center is the nearest one. However, because the clusters have different shapes and densities, selection of the appropriate cluster is a challenge. In this paper, an approach has been proposed to improve the pruning phase of the LC-KNN method by taking into account these factors. The proposed approach helps to choose a more appropriate cluster of data for looking for the neighbors, thus, increasing the classification accuracy. The performance of the proposed approach is evaluated on different real datasets. The experimental results show the effectiveness of the proposed approach and its higher classification accuracy and lower time cost in comparison to other recent relevant methods.
机译:K-CORMATE邻居(KNN)机器学习算法是一种众所周知的非参数分类方法。但是,与其他传统数据挖掘方法一样,将其应用于大数据附带的计算挑战。实际上,knn基于其最近邻居的类别来确定新样本的类;然而,在大量数据中识别邻居施加了大的计算成本,使得它不再适用于单个计算机器。制作在大型数据集上适用的分类方法的一个提出技术是修剪的。 LC-KNN是一种改进的KNN方法,首先使用K-MEASE聚类方法将数据簇将数据群集成一些较小的分区;然后将knn适用于其中心是最接近的分区上的每个新样本。但是,因为群集具有不同的形状和密度,所以选择适当的集群是挑战。在本文中,已经提出了一种方法来通过考虑这些因素来改善LC-KNN方法的修剪阶段。所提出的方法有助于选择更合适的数据群,以寻找邻居,从而增加了分类准确性。在不同的实时数据集中评估所提出的方法的性能。实验结果表明,与其他最近的相关方法相比,该方法的有效性及其较高的分类准确性和较低的时间成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号