Parallel Processing of Improved KNN Text Classification Algorithm Based on Hadoop

机译：改进的基于Hadoop的KNN文本分类算法的并行处理

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the rapid development of mobile Internet, the network has become an important medium for people to exchange information. The research on text classification has practical significance. Using the Hadoop platform to parallelize the KNN classification algorithm can quickly and accurately classify the text, but when calculating the similarity or distance of the sample points, the KNN algorithm will increase with the increase of the sample data, which will lead to the algorithm time. Increased complexity and reduced classification accuracy. Therefore, Parallel Processing of Improved KNN text classification algorithm based on Hadoop platform is proposed. The CLARA clustering algorithm is used to cut out the samples with low similarity in the dataset, and the calculation of sample distance in the dataset is reduced. Then design the parallel KNN MapReduce program to classify the network public opinion data. The experimental results show that the improved parallel KNN algorithm improves the accuracy and time of text classification.

机译：随着移动互联网的飞速发展，网络已经成为人们交流信息的重要媒介。文本分类研究具有现实意义。使用Hadoop平台对KNN分类算法进行并行化可以快速，准确地对文本进行分类，但是在计算样本点的相似度或距离时，KNN算法会随着样本数据的增加而增加，从而导致算法时间的增加。增加了复杂性并降低了分类准确性。因此，提出了一种基于Hadoop平台的改进的KNN文本分类算法的并行处理。 CLARA聚类算法用于在数据集中减少相似度较低的样本，减少了数据集中样本距离的计算。然后设计并行的KNN MapReduce程序对网络舆情数据进行分类。实验结果表明，改进的并行KNN算法提高了文本分类的准确性和时间。

著录项

来源
《International Conference on Information, Communication and Networks》|2019年|167-170|共4页
会议地点 Macao(CN)
作者
Shaobo Du; Jing Li;
展开▼
作者单位

School of Computer Information Engineering Guizhou University of Commerce Guiyan China;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Classification algorithms; Clustering algorithms; Training; Text categorization; Training data; Partitioning algorithms; Machine learning algorithms;

机译：分类算法；聚类算法；训练;文字分类；培训数据；分区算法；机器学习算法;

相似文献

外文文献
中文文献
专利

1. An Improved KNN Text Classification Algorithm Based on Clustering [J] . Zhou Yong, Li Youwen, Xia Shixiong Journal of Computers . 2009,第3期

机译：一种基于聚类的改进的KNN文本分类算法
2. Supervised and semi-supervised learning in text classification using enhanced KNN algorithm: a comparative study of supervised and semi-supervised classification in text categorisation [J] . M. A. Wajeed, T. Adilakshmi International Journal of Intelligent Systems Technologies and Applications . 2012,第3a4期

机译：使用增强型KNN算法的文本分类中的有监督和半监督学习：文本分类中有监督和半监督分类的比较研究
3. An Improved Parallel Collaborative Filtering Algorithm based on Hadoop [J] . Baojun Fu International Journal of Performability Engineering . 2018,第3期

机译：基于Hadoop的改进的并行协同滤波算法
4. Parallel Processing of Improved KNN Text Classification Algorithm Based on Hadoop [C] . Shaobo Du, Jing Li International Conference on Information, Communication and Networks . 2019

机译：基于Hadoop的改进KNN文本分类算法的并行处理
5. Parallelization for image processing algorithms based on chain and mid-crack codes. [D] . Wong, Wai-Tak. 1999

机译：基于链和中裂纹代码的图像处理算法的并行化。
6. Improved support vector machine classification algorithm based on adaptive feature weight updating in the Hadoop cluster environment [O] . Jianfang Cao, Min Wang, Yanfei Li, 2012

机译：Hadoop集群环境中基于自适应特征权重更新的改进支持向量机分类算法
7. An Improved KNN Text Classification Algorithm Based on Clustering [O] . Shixiong Xia, Youwen Li, Yong Zhou 2009

机译：基于聚类的改进的KNN文本分类算法

Parallel Processing of Improved KNN Text Classification Algorithm Based on Hadoop

摘要

著录项

相似文献

相关主题

期刊订阅