首页> 外文会议>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >Semi-supervised and Unsupervised Categorization of Posts in Web Discussion Forums using Part-of-Speech Information and Minimal Features
【24h】

Semi-supervised and Unsupervised Categorization of Posts in Web Discussion Forums using Part-of-Speech Information and Minimal Features

机译:使用演讲部分和最小的功能,在Web讨论论坛中半监督和无监督分类。

获取原文

摘要

Web discussion forums typically contain posts that fall into different categories such as question, solution, feedback, spam, etc. Automatic identification of these categories can aid information retrieval that is tailored for specific user requirements. Previously, a number of supervised methods have attempted to solve this problem; however, these depend on the availability of abundant training data. A few existing unsupervised and semi-supervised approaches are either focused on identifying only one or two categories, or do not discuss category-specific performance. In contrast, this work proposes methods for identifying multiple categories, and also analyzes the category-specific performance. These methods are based on sequence models (specifically, hidden Markov Models) that can model language for each category using both probabilistic word and part-of-speech information, and minimal manually specified features. The unsupervised version initializes the models using clustering, whereas the semi-supervised version uses few manually labeled forum posts. Empirical evaluations demonstrate that these methods are more accurate than previous ones.
机译:Web讨论论坛通常包含属于不同类别的帖子,如问题,解决方案,反馈,垃圾邮件等。自动识别这些类别可以帮助信息检索,以针对特定的用户要求量身定制。以前,许多监督方法试图解决这个问题;但是,这些取决于培训数据的可用性。一些现有的无监督和半监督方法要么专注于识别一两类或两种类别,或者不讨论特定于类别的性能。相比之下,这项工作提出了识别多个类别的方法,并分析了特定于类别的性能。这些方法基于序列模型(具体而言,隐藏的Markov模型),其可以使用概率字和语音部分和语音部分的每个类别的语言模拟语言,以及最小的手动指定的功能。无监督的版本使用群集初始化模型,而半监控版本使用少数手动标记的论坛帖子。实证评估表明,这些方法比以前的方法更准确。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号