A Parallel Learning Algorithm for Text Classification

机译：文本分类的并行学习算法

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Text classification is the process of classifying documents into predefined categories based on their content. Existing supervised learning algorithms to automatically classify text need sufficient labeled documents to learn accurately. Applying the Expectation-Maximization (EM) algorithm to this problem is an alternative approach that utilizes a large pool of unlabeled documents to augment the available labeled documents. Unfortunately, the time needed to learn with these large unlabeled documents is too high. This paper introduces a novel parallel learning algorithm for text classification task. The parallel algorithm is based on the combination of the EM algorithm and the naive Bayes classifier. Our goal is to improve the computational time in learning and classifying process. We studied the performance of our parallel algorithm on a large Linux PC cluster called PIRUN Cluster. We report both timing and accuracy results. These results indicate that the proposed parallel algorithm is capable of handling large document collections.

机译：文本分类是根据文档的内容将文档分类为预定义类别的过程。现有的用于自动分类文本的监督学习算法需要足够的带标签文档才能准确学习。将期望最大化（EM）算法应用于此问题是另一种方法，该方法利用大量未标记文档来增加可用的标记文档。不幸的是，学习这些没有标签的大型文档所需的时间太长。本文介绍了一种用于文本分类任务的新型并行学习算法。并行算法基于EM算法和朴素贝叶斯分类器的组合。我们的目标是缩短学习和分类过程中的计算时间。我们在称为PIRUN Cluster的大型Linux PC群集上研究了并行算法的性能。我们同时报告时间和准确性结果。这些结果表明，提出的并行算法能够处理大型文档集合。

著录项

来源
《Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Jul 23-26, 2002, Edmonton》|2002年|p.201-206|共6页
会议地点
作者
Canasai Kruengkrai; Chuleerat Jaruskulchai;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词
text classification; parallel expectation-maximization (EM) algorithm; naive bayes; cluster computing;

机译：文字分类并行期望最大化（EM）算法;天真的贝叶斯集群计算;

相似文献

外文文献
中文文献
专利

1. 面向大规模中文文本分类的朴素贝叶斯并行Spark算法 [J] . 刘鹏, 赵慧含, 滕家雨, 中南大学学报（英文版） . 2019,第001期
2. Supervised and semi-supervised learning in text classification using enhanced KNN algorithm: a comparative study of supervised and semi-supervised classification in text categorisation [J] . M. A. Wajeed, T. Adilakshmi International Journal of Intelligent Systems Technologies and Applications . 2012,第3a4期

机译：使用增强型KNN算法的文本分类中的有监督和半监督学习：文本分类中有监督和半监督分类的比较研究
3. Text and non-text image classification algorithm of computer design scene based on deep learning [J] . Lai Shouliang, Luo Zihui, Wang Meiyan Basic & clinical pharmacology & toxicology. . 2019,第S1期

机译：基于深度学习的计算机设计场景文本与非文本图像分类算法
4. Text and non-text image classification algorithm of computer design scene based on deep learning [J] . Basic & clinical pharmacology & toxicology. . 2019,第S10期

机译：基于深度学习的计算机设计场景文本与非文本图像分类算法
5. A parallel learning algorithm for text classification [C] . Canasai Kruengkrai, Chuleerat Jaruskulchai Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining(KDD-2000) . 2002

机译：文本分类的并行学习算法
6. A Study of Applying Machine Learning Algorithms in Application of Text Classification [D] . Lalluvadia, Megha. 2017

机译：机器学习算法在文本分类中的应用研究
7. Teleconsultations between Patients and Healthcare Professionals in Primary Care in Catalonia: The Evaluation of Text Classification Algorithms Using Supervised Machine Learning [O] . Francesc López Seguí, Ricardo Ander Egg Aguilar, Gabriel de Maeztu, 2020

机译：加泰罗尼亚基层医疗机构的患者与医疗专业人员之间的远程咨询：使用监督机器学习的文本分类算法的评估
8. A Parallel Learning Algorithm for Text Classification [O] . 2008

机译：文本分类的并行学习算法

A Parallel Learning Algorithm for Text Classification

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅