首页> 外文会议>International Conference on Information, Communication and Networks >Parallel Processing of Improved KNN Text Classification Algorithm Based on Hadoop
【24h】

Parallel Processing of Improved KNN Text Classification Algorithm Based on Hadoop

机译:改进的基于Hadoop的KNN文本分类算法的并行处理

获取原文

摘要

With the rapid development of mobile Internet, the network has become an important medium for people to exchange information. The research on text classification has practical significance. Using the Hadoop platform to parallelize the KNN classification algorithm can quickly and accurately classify the text, but when calculating the similarity or distance of the sample points, the KNN algorithm will increase with the increase of the sample data, which will lead to the algorithm time. Increased complexity and reduced classification accuracy. Therefore, Parallel Processing of Improved KNN text classification algorithm based on Hadoop platform is proposed. The CLARA clustering algorithm is used to cut out the samples with low similarity in the dataset, and the calculation of sample distance in the dataset is reduced. Then design the parallel KNN MapReduce program to classify the network public opinion data. The experimental results show that the improved parallel KNN algorithm improves the accuracy and time of text classification.
机译:随着移动互联网的飞速发展,网络已经成为人们交流信息的重要媒介。文本分类研究具有现实意义。使用Hadoop平台对KNN分类算法进行并行化可以快速,准确地对文本进行分类,但是在计算样本点的相似度或距离时,KNN算法会随着样本数据的增加而增加,从而导致算法时间的增加。增加了复杂性并降低了分类准确性。因此,提出了一种基于Hadoop平台的改进的KNN文本分类算法的并行处理。 CLARA聚类算法用于在数据集中减少相似度较低的样本,减少了数据集中样本距离的计算。然后设计并行的KNN MapReduce程序对网络舆情数据进行分类。实验结果表明,改进的并行KNN算法提高了文本分类的准确性和时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号