Using EM to Classify Text from Labeled and Unlabeled Documents

机译：使用Em从标记和未标记文档中分类文本

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is significant because in many important text classification problems obtaining classification labels is expensive, while large quantities of unlabeled documents are readily available. We present a theoretical argument showing that, under common assumptions, unlabeled data contain information about the target function. We then introduce an algorithm for learning from labeled and unlabeled text, based on the combination of Expectation-Maximization with a naive Bayes classifier. The algorithm first trains a classifier using the available labeled documents, and probabilistically labels the unlabeled documents. It then trains a new classifier using the labels for all the documents, and iterates. Experimental results, obtained using text from three different real-world tasks, show that the use of unlabeled data reduces classification error by up to 30%.

著录项

作者
Nigam, K. ; McCallum, A. ; Thrun, S. ; Mitchell, T.;
展开▼
作者单位

展开▼
年度 1998
页码 1-20
总页数 20
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
Algorithms; Text processing; Data bases; Data management; Learning machines; Data reduction; Artificial intelligence; Bayes theorem; Word recognition;

机译：算法;文本处理;数据库;数据管理;学习机;数据简化;人工智能;贝叶斯定理;单词识别;

相似文献

外文文献
中文文献
专利

1. Semi-supervised Learning To Classify Evaluative Expressions From Labeled And Unlabeled Texts [J] . Yasuhiro SUZUKI, Hiroya TAKAMURA, Manabu OKUMURA IEICE Transactions on Information and Systems . 2007,第10期

机译：半监督学习从带标签的文本和不带标签的文本中对评估表达式进行分类
2. Text Classification from Labeled and Unlabeled Documents using EM [J] . KAMAL NIGAM, ANDREW KACHITES MCCALLUM, SEBASTIAN THRUN Machine Learning . 2000,第2a3期

机译：使用EM对标签和未标签文档进行文本分类
3. A fuzzy method to learn text classifier from labeled and unlabeled examples [J] . LIU Hong, HUANG Shang-teng Journal of Harbin Institute of Technology . 2004,第1期

机译：从标记和未标记示例中学习文本分类器的模糊方法
4. Learning to Classify Text from Labeled and Unlabeled Documents [C] . Kamal Nigamy, Andrew McCallumzy, Sebastian Thruny, National Conferences on Aritificial Intelligence . 1999

机译：学习从标记和未标记的文件中分类文本
5. Text document topical recursive clustering and automatic labeling of a hierarchy of document clusters. [D] . Li, Xiaoxiao. 2012

机译：文本文档主题递归群集和文档群集层次结构的自动标记。
6. Clinical Document Classification Using Labeled and Unlabeled Data Across Hospitals [O] . Hamed Hassanzadeh, Mahnoosh Kholghi, Anthony Nguyen, 2018

机译：跨医院使用标记和未标记数据的临床文件分类
7. Using EM to Classify Text from Labeled and Unlabeled Documents [O] . Kamal Nigam, Andrew Mccallum, Sebastian Thrun, 1998

机译：使用EM对标签和未标签文档中的文本进行分类

Using EM to Classify Text from Labeled and Unlabeled Documents

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅