首页> 外文会议>IEEE International Conference on Advanced Computing >Text categorization using Rocchio algorithm and random forest algorithm
【24h】

Text categorization using Rocchio algorithm and random forest algorithm

机译:使用Rocchio算法和随机森林算法进行文本分类

获取原文

摘要

Millions of file uploads and downloads happen every minute resulting in big data creation and manual text categorization is not possible. Hence, there is a need for automatic categorization of documents that makes storage and retrieval more efficient. This research paper proposes a hybrid text categorization model that combines both Rocchio algorithm and Random Forest algorithm to perform Multi-label text categorization. Stop word remover and word stemmer has been used to overcome the limitations in Rocchio Algorithm. Random Forest model takes minimal categories as input to reduce its error rate. Experiments were done on standard text categorization datasets. Our proposed model is found to be more efficient in categorizing the documents when compared with other text categorization models such as fuzzy relevance clustering, ML-KNN (Multi-label KNN) and Naïve-Bayes Algorithms.
机译:每分钟发生数百万个文件的上载和下载,从而导致创建大数据,并且无法进行手动文本分类。因此,需要对文档进行自动分类以使存储和检索更加有效。本文提出了一种混合文本分类模型,该模型结合了Rocchio算法和随机森林算法来执行多标签文本分类。停用词删除器和词干器已用于克服Rocchio算法中的限制。随机森林模型采用最少的类别作为输入,以降低其错误率。实验是在标准文本分类数据集上进行的。与其他文本分类模型(例如,模糊关联聚类,ML-KNN(多标签KNN)和朴素贝叶斯算法)相比,我们提出的模型在文档分类中更有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号