【24h】

A LABEL-ORIENTED APPROACH FOR TEXT CLASSIFICATION

机译:一种面向标签的文本分类方法

获取原文
获取原文并翻译 | 示例
       

摘要

Text classification is a well known problem in the machine learning community. A widely used approach is that based on the Term Frequency - Inverse Document Frequency (TF-IDF) feature. This feature represents very well the characteristic of a text. However, this feature could not clearly represent the relationship from a text to its assigned label. This paper presents a Label-Oriented (LO) approach for text classification problem. This approach takes account of the relationship between a text and its assigned label by introducing a new feature label-oriented score. This score represents the level of the importance of the term regarding all terms and texts assigned to the label compared to all terms and texts unassigned to the label. In the training phase, this model calculates the label-oriented score of each term to a label. In the testing phase, the sum of this score of all terms in a text will help us to determine whether the text should be assigned to the label or not. The proposed model is then evaluated in two cases: short and regular texts. The experiment results indicate that the proposed model is significantly better than baseline models on the considered datasets.
机译:文本分类是机器学习社区中的知名问题。一种广泛使用的方法是基于术语频率 - 逆文档频率(TF-IDF)特征。此功能表示文本的特征。但是,此功能无法清楚地代表文本到分配标签的关系。本文提出了一种面向标签的(LO)方法,用于文本分类问题。这种方法通过引入创新的面向标签的分数来考虑文本和分配标签之间的关系。此分数表示关于为标签分配给标签的所有术语和文本的术语的重要性的级别,而未分配给标签的所有术语和文本。在训练阶段,该模型计算每个术语的标签定向分数到标签。在测试阶段,文本中所有术语的此分数的总和将有助于我们确定文本是否应分配给标签。然后在两种情况下评估所提出的模型:短期和常规文本。实验结果表明,所提出的模型明显优于所考虑的数据集上的基线模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号