首页> 外文会议>International Semantic Web Conference >Weakly Supervised Short Text Categorization Using World Knowledge
【24h】

Weakly Supervised Short Text Categorization Using World Knowledge

机译:利用世界知识弱监督短文本分类

获取原文

摘要

Short text categorization is an important task in many NLP applications, such as sentiment analysis, news feed categorization, etc. Due to the sparsity and shortness of the text, many traditional classification models perform poorly if they are directly applied to short text. Moreover, supervised approaches require large amounts of manually labeled data, which is a costly, labor intensive, and time-consuming task. This paper proposes a weakly supervised short text categorization approach, which does not require any manually labeled data. The proposed model consists of two main modules: (1) a data labeling module, which leverages an external Knowledge Base (KB) to compute probabilistic labels for a given unlabeled training data set, and (2) a classification model based on a Wide & Deep learning approach. The effectiveness of the proposed method is validated via evaluation on multiple datasets. The experimental results show that the proposed approach outperforms unsupervised state-of-the-art classification approaches and achieves comparable performance to supervised approaches.
机译:短文本分类是许多NLP应用,如情感分析,新闻Feed分类等。由于文本的稀疏和急促的一项重要任务,许多传统的分类模型表现不佳,如果他们直接应用于短文本。此外,监督的方法需要大量的手工标记的数据,这是一个昂贵的,劳动密集的,且耗时的任务。本文提出了一种弱监督短文本分类方法,它不需要任何手动标签的数据。该模型包括两个主要模块:(1)一个数据标注模块,它利用外部知识库(KB)来计算概率标签对于给定的未标记的训练数据集,和(2)基于宽&分类模型深学习方法。所提出的方法的有效性是通过评价验证在多个数据集。实验结果表明,该方法比无监督状态的最先进的分类方法,并达到相当的性能,以监督的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号