首页> 外文会议>Joint International Information Technology and Mechanical and Electronic Engineering Conference >Research on improved K - nearest neighbor algorithm based on spark platform
【24h】

Research on improved K - nearest neighbor algorithm based on spark platform

机译:基于火花平台的改进k - 最近邻算法研究

获取原文

摘要

Today, big data technology is growing rapidly. The birth of Hadoop makes people concerned about the study of MapReduce, And Spark through the introduction of RDD data model and memory-based computing model, So that it can be well adapted to the data mining of big data this scene, And superior to Hadoop in iterative computing, Quickly became the majority of enterprises, scholars of the research focus. K nearest neighbor algorithm (KNN is used instead of the following) is a very important classification algorithm. A lot of people are studying it, But there is no mature solution to the algorithm in the spark platform to achieve parallelization. In this paper, The author realizes the parallelization of the improved KNN on the spark platform. We use clustering algorithms, Find the weight of each training sample in the training sample set, The weights of the K samples are used to distinguish the K nearest neighbors from the test sample. It is proved by experiments that the improved KNN has better accuracy.
机译:今天,大数据技术正在迅速增长。 Hadoop的诞生使人们通过引入RDD数据模型和基于内存的计算模型的引入来使人们担心研究MapReduce,并且可以很好地适应该场景的大数据的数据挖掘,并且优于Hadoop在迭代计算中,迅速成为大多数企业,学者的研究重点。 K最近邻算法(knn使用而不是以下内容)是一个非常重要的分类算法。很多人都在学习它,但是在火花平台中没有成熟的算法来实现并行化。在本文中,作者实现了改进的KNN对火花平台的平行化。我们使用聚类算法,找到训练样本集中的每个训练样本的重量,K样品的权重用于区分K最近邻居从测试样品。通过实验证明,改进的KNN具有更好的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号