首页> 外文会议>International conference on recent advances in natural language processing >A Boosting-based Algorithm for Classification of Semi-Structured Text using Frequency of Substructures
【24h】

A Boosting-based Algorithm for Classification of Semi-Structured Text using Frequency of Substructures

机译:基于子结构频率的基于Boosting的半结构文本分类算法

获取原文

摘要

Research in text classification currently focuses on challenging tasks such as sentiment classification, modality identification, and so on. In these tasks, approaches that use a structural representation, like a tree, have shown better performance rather than a bag-of-words representation. In this paper, we propose a boosting algorithm for classifying a text that is a set of sentences represented by tree. The algorithm learns rules represented by subtrees with their frequency information. Existing boosting-based algorithms use subtrees as features without considering their frequency because the existing algorithms targeted a sentence rather than a text. In contrast, our algorithm learns how the occurrence frequency of each subtree is important for classification. Experiments on topic identification of Japanese news articles and English sentiment classification shows the effectiveness of subtree features with their frequency.
机译:文本分类的研究目前集中在具有挑战性的任务上,例如情感分类,模态识别等。在这些任务中,使用结构表示法(例如树)的方法比单词袋表示法具有更好的性能。在本文中,我们提出了一种用于对文本进行分类的增强算法,该文本是由树表示的一组句子。该算法学习由子树表示的规则及其频率信息。现有的基于增强的算法将子树用作特征而没有考虑其频率,因为现有算法的目标是句子而不是文本。相反,我们的算法了解每个子树的出现频率对于分类很重要。日本新闻文章主题识别和英语情感分类的实验表明,子树特征的有效性及其出现频率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号