A Boosting-based Algorithm for Classification of Semi-Structured Text using Frequency of Substructures

机译：基于子结构频率的基于Boosting的半结构文本分类算法

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Research in text classification currently focuses on challenging tasks such as sentiment classification, modality identification, and so on. In these tasks, approaches that use a structural representation, like a tree, have shown better performance rather than a bag-of-words representation. In this paper, we propose a boosting algorithm for classifying a text that is a set of sentences represented by tree. The algorithm learns rules represented by subtrees with their frequency information. Existing boosting-based algorithms use subtrees as features without considering their frequency because the existing algorithms targeted a sentence rather than a text. In contrast, our algorithm learns how the occurrence frequency of each subtree is important for classification. Experiments on topic identification of Japanese news articles and English sentiment classification shows the effectiveness of subtree features with their frequency.

机译：文本分类的研究目前集中在具有挑战性的任务上，例如情感分类，模态识别等。在这些任务中，使用结构表示法（例如树）的方法比单词袋表示法具有更好的性能。在本文中，我们提出了一种用于对文本进行分类的增强算法，该文本是由树表示的一组句子。该算法学习由子树表示的规则及其频率信息。现有的基于增强的算法将子树用作特征而没有考虑其频率，因为现有算法的目标是句子而不是文本。相反，我们的算法了解每个子树的出现频率对于分类很重要。日本新闻文章主题识别和英语情感分类的实验表明，子树特征的有效性及其出现频率。

著录项

来源
《International conference on recent advances in natural language processing》|2013年|319-326|共8页
会议地点
作者
Tomoya Iwakura;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A SEMI-STRUCTURED TEXTS CLUSTERING ALGORITHM [J] . ZHANG PEI YUN, CHEN EN HONG, HUANG BO Journal of Theoretical and Applied Information Technology . 2013,第3期

机译：半结构化文本聚类算法
2. A SEMI-STRUCTURED TEXTS CLUSTERING ALGORITHM [J] . ZHANG PEI YUN, CHEN EN HONG, HUANG BO Journal of Theoretical and Applied Information Technology . 2013,第3期

机译：半结构化文本聚类算法
3. Supervised and semi-supervised learning in text classification using enhanced KNN algorithm: a comparative study of supervised and semi-supervised classification in text categorisation [J] . M. A. Wajeed, T. Adilakshmi International Journal of Intelligent Systems Technologies and Applications . 2012,第3a4期

机译：使用增强型KNN算法的文本分类中的有监督和半监督学习：文本分类中有监督和半监督分类的比较研究
4. A Boosting-based Algorithm for Classification of Semi-Structured Text using Frequency of Substructures [C] . Tomoya Iwakura International conference on recent advances in natural language processing . 2013

机译：一种基于促进基于促进子结构频率的半结构化文本的算法
5. Design of multiple frequency continuous wave radar hardware and micro-Doppler based detection and classification algorithms [D] . Anderson, Michael Glen 2008

机译：基于多频连续波雷达硬件和微多普勒检测分类算法的设计
6. Teleconsultations between Patients and Healthcare Professionals in Primary Care in Catalonia: The Evaluation of Text Classification Algorithms Using Supervised Machine Learning [O] . Francesc López Seguí, Ricardo Ander Egg Aguilar, Gabriel de Maeztu, 2020

机译：加泰罗尼亚基层医疗机构的患者与医疗专业人员之间的远程咨询：使用监督机器学习的文本分类算法的评估
7. From faceted classification to knowledge discovery of semi-structured text records [O] . Goh Yee M., Giess Matt, McMahon Chris, 2009

机译：从分面分类到半结构化文本记录的知识发现

A Boosting-based Algorithm for Classification of Semi-Structured Text using Frequency of Substructures

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅