A New K-Nearest Neighbors Classifier for Big Data Based on Efficient Data Pruning

Hamid Saadatfar; Samiyeh Khosravi; Javad Hassannataj Joloudari; Amir Mosavi; Shahaboddin Shamshirband

首页> 外文期刊>Mathematics >A New K-Nearest Neighbors Classifier for Big Data Based on Efficient Data Pruning

【24h】

A New K-Nearest Neighbors Classifier for Big Data Based on Efficient Data Pruning

机译：基于高效数据修剪的大数据的新的K-Collect邻居分类器

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The K-nearest neighbors (KNN) machine learning algorithm is a well-known non-parametric classification method. However, like other traditional data mining methods, applying it on big data comes with computational challenges. Indeed, KNN determines the class of a new sample based on the class of its nearest neighbors; however, identifying the neighbors in a large amount of data imposes a large computational cost so that it is no longer applicable by a single computing machine. One of the proposed techniques to make classification methods applicable on large datasets is pruning. LC-KNN is an improved KNN method which first clusters the data into some smaller partitions using the K-means clustering method; and then applies the KNN for each new sample on the partition which its center is the nearest one. However, because the clusters have different shapes and densities, selection of the appropriate cluster is a challenge. In this paper, an approach has been proposed to improve the pruning phase of the LC-KNN method by taking into account these factors. The proposed approach helps to choose a more appropriate cluster of data for looking for the neighbors, thus, increasing the classification accuracy. The performance of the proposed approach is evaluated on different real datasets. The experimental results show the effectiveness of the proposed approach and its higher classification accuracy and lower time cost in comparison to other recent relevant methods.

机译：K-CORMATE邻居（KNN）机器学习算法是一种众所周知的非参数分类方法。但是，与其他传统数据挖掘方法一样，将其应用于大数据附带的计算挑战。实际上，knn基于其最近邻居的类别来确定新样本的类;然而，在大量数据中识别邻居施加了大的计算成本，使得它不再适用于单个计算机器。制作在大型数据集上适用的分类方法的一个提出技术是修剪的。 LC-KNN是一种改进的KNN方法，首先使用K-MEASE聚类方法将数据簇将数据群集成一些较小的分区;然后将knn适用于其中心是最接近的分区上的每个新样本。但是，因为群集具有不同的形状和密度，所以选择适当的集群是挑战。在本文中，已经提出了一种方法来通过考虑这些因素来改善LC-KNN方法的修剪阶段。所提出的方法有助于选择更合适的数据群，以寻找邻居，从而增加了分类准确性。在不同的实时数据集中评估所提出的方法的性能。实验结果表明，与其他最近的相关方法相比，该方法的有效性及其较高的分类准确性和较低的时间成本。

著录项

来源
《Mathematics》 |2020年第2期|共12页
作者
Hamid Saadatfar; Samiyeh Khosravi; Javad Hassannataj Joloudari; Amir Mosavi; Shahaboddin Shamshirband;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
K-nearest neighborsKNNclassifiermachine learningbig dataclusteringcluster shapecluster densityclassificationreinforcement learningmachine learning for big datadata sciencecomputationartificial intelligence;

机译：K-Circleborsknnclassifiermachine Squestbig DataclusteringCluster ShapeCluster DenySclasificationReinceilation LearningMachine学习大型Data ScienceComoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMocoMoMocoMoMoMoMocoMoMoMoMocoMoMocoMoMocoMoMocoMoMocoMocoMocoMocoMoMocoMocoMoMocoMocoMocoMocoMoMachine;

相似文献

外文文献
中文文献
专利

1. kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data [J] . Maillo Jesus, Ramirez Sergio, Triguero Isaac, Knowledge-Based Systems . 2017,第FEBa期

机译：kNN-IS：基于迭代Spark的大数据k最近邻分类器设计
2. An efficient continuous k-nearest neighbor query processing scheme for multimedia data sharing and transmission in location based services [J] . Bok Kyoungsoo, Park Yonghun, Yoo Jaesoo Multimedia Tools and Applications . 2019,第5期

机译：一种高效的连续k最近邻居查询处理方案，用于基于位置的服务中的多媒体数据共享和传输
3. Fast and Scalable Approaches to Accelerate the Fuzzy k-Nearest Neighbors Classifier for Big Data [J] . Maillo Jesus, Garcia Salvador, Luengo Julian, IEEE Transactions on Fuzzy Systems . 2020,第5期

机译：快速和可扩展的方法，以加速模糊K-Collect邻居分类器进行大数据
4. Using the cluster-based tree structure of k-nearest neighbor to reduce the effort required to classify unlabeled large datasets [C] . Elias de Oliveira, Howard Roatti, Matheus de Araujo Nogueira, 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management . 2015

机译：使用k最近邻的基于集群的树结构来减少对未标记的大型数据集进行分类所需的工作
5. Efficient processing of k-nearest neighbor queries over relational databases: A cost-based optimization [D] . Ayanso, Anteneh W. 2004

机译：在关系数据库上高效处理k最近邻查询：基于成本的优化
6. A Sensor Data Fusion System Based on k-Nearest Neighbor Pattern Classification for Structural Health Monitoring Applications [O] . Jaime Vitola, Francesc Pozo, Diego A. Tibaduiza, 2017

机译：基于k-最近邻模式分类的传感器数据融合系统在结构健康监测中的应用
7. kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data [O] . Maillo Jesus, Ramirez Sergio, Triguero Isaac, 2016

机译：kNN-IS：大数据的k最近邻分类器的基于迭代火花的设计
8. K-Nearest Neighbor Attractor-Based Neural Network and the Optimal Linear Discriminatory Filter Classifier [R] . Dobeck, G. J. 2006

机译：基于K-最近邻吸引子的神经网络和最优线性判别滤波器分类器

A New K-Nearest Neighbors Classifier for Big Data Based on Efficient Data Pruning

摘要

著录项

相似文献

相关主题

期刊订阅