Parallel Processing of Improved KNN Text Classification Algorithm Based on Hadoop

机译：基于Hadoop的改进KNN文本分类算法的并行处理

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

With the rapid development of mobile Internet, the network has become an important medium for people to exchange information. The research on text classification has practical significance. Using the Hadoop platform to parallelize the KNN classification algorithm can quickly and accurately classify the text, but when calculating the similarity or distance of the sample points, the KNN algorithm will increase with the increase of the sample data, which will lead to the algorithm time. Increased complexity and reduced classification accuracy. Therefore, Parallel Processing of Improved KNN text classification algorithm based on Hadoop platform is proposed. The CLARA clustering algorithm is used to cut out the samples with low similarity in the dataset, and the calculation of sample distance in the dataset is reduced. Then design the parallel KNN MapReduce program to classify the network public opinion data. The experimental results show that the improved parallel KNN algorithm improves the accuracy and time of text classification.

机译：随着移动互联网的快速发展，网络已成为人们交换信息的重要媒介。文本分类的研究具有现实意义。使用Hadoop平台并行化KNN分类算法可以快速准确地分类文本，但是在计算样本点的相似度或距离时，KNN算法将随着样本数据的增加而增加，这将导致算法时间。增加复杂性和分类准确性降低。因此，提出了基于Hadoop平台的改进的KNN文本分类算法的并行处理。 ClarA聚类算法用于在数据集中剪断具有低相似性的样本，并且减少了数据集中的采样距离的计算。然后设计并行Knn MapReduce程序以对网络公共意见数据进行分类。实验结果表明，改进的平行KNN算法提高了文本分类的准确性和时间。

著录项

来源
《International Conference on Information, Communication and Networks》|2019年|ix 233 p. :|共4页
会议地点
作者
Shaobo Du; Jing Li;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术及设备;
关键词
Classification algorithms; Clustering algorithms; Training; Text categorization; Training data; Partitioning algorithms; Machine learning algorithms;

机译：分类算法;聚类算法;培训;文本分类;培训数据;分区算法;机器学习算法;

相似文献

外文文献
中文文献
专利

1. An Improved KNN Text Classification Algorithm Based on Clustering [J] . Zhou Yong, Li Youwen, Xia Shixiong Journal of Computers . 2009,第3期

机译：一种基于聚类的改进的KNN文本分类算法
2. Supervised and semi-supervised learning in text classification using enhanced KNN algorithm: a comparative study of supervised and semi-supervised classification in text categorisation [J] . M. A. Wajeed, T. Adilakshmi International Journal of Intelligent Systems Technologies and Applications . 2012,第3a4期

机译：使用增强型KNN算法的文本分类中的有监督和半监督学习：文本分类中有监督和半监督分类的比较研究
3. An Improved Parallel Collaborative Filtering Algorithm based on Hadoop [J] . Baojun Fu International Journal of Performability Engineering . 2018,第3期

机译：基于Hadoop的改进的并行协同滤波算法
4. Parallel Processing of Improved KNN Text Classification Algorithm Based on Hadoop [C] . Shaobo Du, Jing Li International Conference on Information, Communication and Networks . 2019

机译：改进的基于Hadoop的KNN文本分类算法的并行处理
5. Parallelization for image processing algorithms based on chain and mid-crack codes. [D] . Wong, Wai-Tak. 1999

机译：基于链和中裂纹代码的图像处理算法的并行化。
6. Improved support vector machine classification algorithm based on adaptive feature weight updating in the Hadoop cluster environment [O] . Jianfang Cao, Min Wang, Yanfei Li, 2012

机译：Hadoop集群环境中基于自适应特征权重更新的改进支持向量机分类算法
7. An Improved KNN Text Classification Algorithm Based on Clustering [O] . Shixiong Xia, Youwen Li, Yong Zhou 2009

机译：基于聚类的改进的KNN文本分类算法

Parallel Processing of Improved KNN Text Classification Algorithm Based on Hadoop

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅