首页> 外文会议>International Conference on Information, Communication and Networks >Parallel Processing of Improved KNN Text Classification Algorithm Based on Hadoop
【24h】

Parallel Processing of Improved KNN Text Classification Algorithm Based on Hadoop

机译:基于Hadoop的改进KNN文本分类算法的并行处理

获取原文

摘要

With the rapid development of mobile Internet, the network has become an important medium for people to exchange information. The research on text classification has practical significance. Using the Hadoop platform to parallelize the KNN classification algorithm can quickly and accurately classify the text, but when calculating the similarity or distance of the sample points, the KNN algorithm will increase with the increase of the sample data, which will lead to the algorithm time. Increased complexity and reduced classification accuracy. Therefore, Parallel Processing of Improved KNN text classification algorithm based on Hadoop platform is proposed. The CLARA clustering algorithm is used to cut out the samples with low similarity in the dataset, and the calculation of sample distance in the dataset is reduced. Then design the parallel KNN MapReduce program to classify the network public opinion data. The experimental results show that the improved parallel KNN algorithm improves the accuracy and time of text classification.
机译:随着移动互联网的快速发展,网络已成为人们交换信息的重要媒介。文本分类的研究具有现实意义。使用Hadoop平台并行化KNN分类算法可以快速准确地分类文本,但是在计算样本点的相似度或距离时,KNN算法将随着样本数据的增加而增加,这将导致算法时间。增加复杂性和分类准确性降低。因此,提出了基于Hadoop平台的改进的KNN文本分类算法的并行处理。 ClarA聚类算法用于在数据集中剪断具有低相似性的样本,并且减少了数据集中的采样距离的计算。然后设计并行Knn MapReduce程序以对网络公共意见数据进行分类。实验结果表明,改进的平行KNN算法提高了文本分类的准确性和时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号