Improving short-text classification using unlabeled background knowledge to assess document similarity

机译：使用未标记的背景知识改进短文本分类，以评估文档相似性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We describe a method for improving the classification of short text strings using a combination of labeled training data plus a secondary corpus of unlabeled but related longer documents. We show that such unlabeled background knowledge can greatly decrease error rates, particularly if the number of examples or the size of the strings in the training set is small. This is particularly useful when labeling text is a labor-intensive job and when there is a large amount of information available about a particular problem on the World Wide Web. Our approach views the task as one of information integration using WHIRL, a tool that combines database functionalities with techniques from the information-retrieval literature.

机译：我们描述了一种利用标记的训练数据的组合加上简单的培训数据的组合来改进短文本字符串分类的方法，而是与未标记的次要语料库相关但相关的文档。我们表明，这种未标记的背景知识可以大大降低误差率，特别是如果训练集中的字符串的数量或尺寸很小。当标签文本是劳动密集型工作时，这尤其有用，并且当世界宽网络上有大量信息时，当有很多信息时。我们的方法将任务视为使用Whirl的信息集成之一，该工具将数据库功能与来自信息检索文献的技术相结合。

著录项

来源
《International conference on machine learning》|2000年||共8页
会议地点
作者
Sarah Zelikovitz; Haym Hirsh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机的应用;
关键词

相似文献

外文文献
中文文献
专利

1. Improving structural similarity based virtual screening using background knowledge [J] . Tobias Girschick, Lucia Puchbauer, Stefan Kramer Journal of Cheminformatics . 2013,第1期

机译：使用背景知识改善基于结构相似性的虚拟筛选
2. Improving structural similarity based virtual screening using background knowledge [J] . Tobias Girschick, Lucia Puchbauer, Stefan Kramer Journal of Cheminformatics . 2013,第S1期

机译：使用背景知识改善基于结构相似性的虚拟筛选
3. Extending WHIRL with background knowledge for improved text classification [J] . Sarah Zelikovitz, William W. Cohen, Haym Hirsh Information retrieval . 2007,第1期

机译：使用背景知识扩展WHIRL以改善文本分类
4. Improving short-text classification using unlabeled background knowledge to assess document similarity [C] . Sarah Zelikovitz, Haym Hirsh International conference on machine learning . 2000

机译：使用未标记的背景知识改进短文本分类，以评估文档相似性
5. Using background knowledge to improve text classification. [D] . Zelikovitz, Sarah. 2002

机译：利用背景知识来改善文本分类。
6. Improving structural similarity based virtual screening using background knowledge [O] . Tobias Girschick, Lucia Puchbauer, Stefan Kramer 2013

机译：使用背景知识改善基于结构相似性的虚拟筛选
7. Utilizing Unlabeled Documents in Automatic Classification with Inter-document Similarities [O] . Pan-Jun Kim, Jae-Yun Lee 2007

机译：利用未标记的文档在自动分类中与文档间相似之处
8. Using Unlabeled Data to Improve Text Classification [R] . Nigam, K. P. 2001

机译：使用未标记的数据改进文本分类

Improving short-text classification using unlabeled background knowledge to assess document similarity

摘要

著录项

相似文献

相关主题

期刊订阅