首页> 外文期刊>Knowledge-Based Systems >A new Centroid-Based Classification model for text categorization
【24h】

A new Centroid-Based Classification model for text categorization

机译:一个新的基于质心的文本分类模型

获取原文
获取原文并翻译 | 示例

摘要

The automatic text categorization technique has gained significant attention among researchers because of the increasing availability of online text information. Therefore, many different learning approaches have been designed in the text categorization field. Among them, the widely used method is the Centroid-Based Classifier (CBC) due to its theoretical simplicity and computational efficiency. However, the classification accuracy of CBC greatly depends on the data distribution. Thus it leads to a misfit model and also has poor classification performance when the data distribution is highly skewed. In this paper, a new classification model named as Gravitation Model (GM) is proposed to solve the class-imbalanced classification problem. In the training phase, each class is weighted by a mass factor, which can be learned from the training data, to indicate data distribution of the corresponding class. In the testing phase, a new document will be assigned to a particular class with the max gravitational force. The performance comparisons with CBC and its variants based on the results of experiments conducted on twelve real datasets show that the proposed gravitation model consistently outperforms CBC together with the Class-Feature-Centroid Classifier (CFC). Also, it obtains the classification accuracy competitive to the DragPushing (DP) method while it maintains a more stable performance. Thus, the proposed gravitation model is proved to be less over-fitting and has higher learning ability than CBC model. (C) 2017 The Authors. Published by Elsevier B.V.
机译:由于在线文本信息的可用性不断提高,自动文本分类技术已引起研究人员的广泛关注。因此,在文本分类领域中已经设计了许多不同的学习方法。其中,由于其理论上的简单性和计算效率,广泛使用的方法是基于质心的分类器(CBC)。但是,CBC的分类精度在很大程度上取决于数据分布。因此,当数据分布高度偏斜时,它会导致模型失配,并且分类性能也很差。为了解决类不平衡分类问题,提出了一种新的分类模型,称为重力模型。在训练阶段,每个类别均通过质量因子加权,该质量因子可以从训练数据中学习,以指示相应类别的数据分布。在测试阶段,将使用最大重力将新文档分配给特定类别。根据在十二个真实数据集上进行的实验结果,与CBC及其变体进行性能比较,结果表明,所提出的重力模型始终优于CBC和类特征中心形分类器(CFC)。而且,它在保持更稳定性能的同时,获得了与DragPushing(DP)方法相媲美的分类精度。因此,与CBC模型相比,所提出的万有引力模型被证明具有较小的拟合度,并且具有较高的学习能力。 (C)2017作者。由Elsevier B.V.发布

著录项

  • 来源
    《Knowledge-Based Systems》 |2017年第15期|15-26|共12页
  • 作者单位

    Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Sichuan, Peoples R China;

    Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Sichuan, Peoples R China|Shanghai Hefu Artificial Intelligence Technol Grp, Hefu Inst UESTC, Chengdu 611731, Sichuan, Peoples R China;

    Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Sichuan, Peoples R China;

    Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Sichuan, Peoples R China;

    Univ Calif San Diego, Dept Math, La Jolla, CA 92093 USA;

    Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Sichuan, Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Text categorization; Centroid-Based Classifier; Machine learning; Gravitation Model;

    机译:文本分类;基于中心分类器;机器学习;引力模型;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号